Processes & Job Control
Why This Matters
You SSH into a production server and the monitoring alert says CPU usage is at 98%. Something is eating all your compute. Or maybe you started a long database migration over SSH and your connection dropped -- is the migration still running? Did it die? How do you reconnect to it?
Every command you type, every service running in the background, every daemon handling network requests -- they are all processes. Understanding how processes work, how to inspect them, how to control them, and how to keep them running when you walk away from the terminal is fundamental to Linux administration.
This chapter takes you from "what is a process?" to confidently managing foreground and background jobs, diagnosing runaway processes, and understanding the process lifecycle from birth to death.
Try This Right Now
# How many processes are running on your system right now?
ps aux | wc -l
# What is YOUR current shell's process ID?
echo $$
# What is your shell's parent process?
echo $PPID
# See the process tree rooted at PID 1
pstree -p 1 | head -30
# What is consuming the most CPU right now?
ps aux --sort=-%cpu | head -5
You will see that even a "quiet" Linux system has dozens or hundreds of processes running. Every one of them has a story.
What Is a Process?
A process is a running instance of a program. The program /usr/bin/bash is a file sitting on disk. When you launch a terminal, the kernel loads that program into memory, assigns it resources, and creates a process. If you open three terminals, you have three separate bash processes, each with its own memory, its own variables, its own PID.
Key Attributes of Every Process
+------------------------------------------+
| Process Attributes |
|------------------------------------------|
| PID - Process ID (unique number) |
| PPID - Parent Process ID |
| UID - User who owns it |
| State - Running, Sleeping, etc. |
| Nice - Priority value |
| Memory - How much RAM it uses |
| CPU - How much CPU time it uses |
| TTY - Terminal it is attached to |
| CMD - The command that started it |
+------------------------------------------+
How Processes Are Born: fork() and exec()
When you type ls in your shell, here is what actually happens:
Your bash shell (PID 1234)
|
| 1. fork() -- creates a COPY of itself
|
v
Child bash (PID 5678) <-- exact clone of parent
|
| 2. exec() -- replaces itself with the 'ls' program
|
v
ls process (PID 5678) <-- now running ls code
|
| 3. ls runs, outputs results
|
| 4. exit() -- process terminates
|
v
Parent bash (PID 1234) collects exit status
(the child is now gone)
This fork-then-exec pattern is how nearly every process on Linux is created. The very first process, PID 1 (init or systemd), is started by the kernel. Every other process is a descendant of PID 1.
# See the full process tree to verify this
pstree -p
Think About It: If every process is created by forking, and PID 1 is the ancestor of all, what happens to a process if its parent dies before it does? (We will answer this when we discuss zombie and orphan processes.)
Inspecting Processes with ps
ps is the workhorse command for viewing processes. It has two major syntax styles (because Unix history is messy):
BSD Style (No Dashes)
# The "classic" incantation -- show ALL processes with details
ps aux
Output columns:
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 1 0.0 0.1 169516 13200 ? Ss Feb10 0:07 /sbin/init
alice 1234 0.0 0.0 10056 3456 pts/0 Ss 09:15 0:00 -bash
alice 5678 45.2 2.1 987654 56789 pts/0 R+ 09:30 5:12 python train.py
| Column | Meaning |
|---|---|
| USER | Process owner |
| PID | Process ID |
| %CPU | CPU percentage |
| %MEM | Memory percentage |
| VSZ | Virtual memory size (KB) |
| RSS | Resident Set Size -- actual physical memory (KB) |
| TTY | Terminal (? means no terminal -- a daemon) |
| STAT | Process state (see below) |
| START | When the process started |
| TIME | Total CPU time consumed |
| COMMAND | The command line |
System V Style (With Dashes)
# Full-format listing
ps -ef
# Full-format with thread info
ps -eLf
UID PID PPID C STIME TTY TIME CMD
root 1 0 0 Feb10 ? 00:00:07 /sbin/init
alice 1234 1100 0 09:15 pts/0 00:00:00 -bash
Useful ps Recipes
# Find a specific process
ps aux | grep nginx
# Better: use pgrep (no grep-in-grep problem)
pgrep -a nginx
# Show process tree
ps auxf
# Show specific columns
ps -eo pid,ppid,user,%cpu,%mem,stat,cmd --sort=-%mem | head -20
# Show all processes by a specific user
ps -u alice
# Show process with specific PID
ps -p 1234 -o pid,ppid,stat,cmd
Process States
Every process is in one of several states. The STAT column in ps shows this:
+-------+------------------+------------------------------------------+
| Code | State | Meaning |
+-------+------------------+------------------------------------------+
| R | Running | Actively using CPU or ready to run |
| S | Sleeping | Waiting for an event (I/O, signal, etc.) |
| D | Uninterruptible | Waiting for I/O (cannot be killed!) |
| | Sleep | |
| Z | Zombie | Finished but parent hasn't collected |
| | | its exit status |
| T | Stopped | Suspended (e.g., Ctrl+Z) |
| t | Traced | Stopped by a debugger |
| I | Idle | Kernel thread, idle |
+-------+------------------+------------------------------------------+
Additional modifiers appear after the main state:
+-------+----------------------------------------------+
| Mod | Meaning |
+-------+----------------------------------------------+
| s | Session leader |
| + | In the foreground process group |
| l | Multi-threaded |
| < | High priority (negative nice value) |
| N | Low priority (positive nice value) |
+-------+----------------------------------------------+
So Ss means "sleeping, session leader." R+ means "running, foreground." Ssl means "sleeping, session leader, multi-threaded."
The Dangerous D State
A process in state D (uninterruptible sleep) cannot be killed, not even with SIGKILL. It is waiting for I/O to complete -- usually disk or NFS. If you see many processes stuck in D state, you likely have a storage problem (dead NFS mount, failing disk, etc.).
# Find processes in D state
ps aux | awk '$8 ~ /D/'
Zombie Processes
A zombie (Z) is not actually running. It is a dead process whose entry still sits in the process table because its parent has not called wait() to collect its exit status. Zombies consume almost no resources (just a PID table entry), but a large number of them indicates a buggy parent process.
# Find zombie processes
ps aux | awk '$8 == "Z"'
# Or
ps -eo pid,ppid,stat,cmd | grep Z
The fix for zombies is to fix or restart their parent process. Killing a zombie does nothing -- it is already dead.
Real-Time Monitoring with top and htop
top
# Launch top
top
Key top commands while running:
| Key | Action |
|---|---|
P | Sort by CPU |
M | Sort by memory |
k | Kill a process (enter PID) |
r | Renice a process |
1 | Show individual CPU cores |
c | Toggle full command path |
f | Choose which fields to display |
q | Quit |
htop (Better Interactive Viewer)
# Install if not present
sudo apt install htop # Debian/Ubuntu
sudo dnf install htop # RHEL/Fedora
# Launch
htop
htop provides:
- Color-coded CPU/memory bars
- Mouse support
- Horizontal scrolling for long commands
- Tree view (press
F5) - Easy filtering (press
F4) - Easy process killing (press
F9)
Distro Note:
htopis not installed by default on most distributions. It is available in the standard repositories for all major distros. Install it -- you will use it constantly.
Exploring /proc/PID/
Every running process has a directory under /proc/ named after its PID. This is a virtual filesystem -- the kernel generates the contents on the fly.
# Explore your own shell's process info
ls /proc/$$/
# What command started this process?
cat /proc/$$/cmdline | tr '\0' ' '; echo
# What is its current working directory?
ls -l /proc/$$/cwd
# What is its executable?
ls -l /proc/$$/exe
# What environment variables does it have?
cat /proc/$$/environ | tr '\0' '\n' | head -10
# What files does it have open?
ls -l /proc/$$/fd/
# Its memory map
cat /proc/$$/maps | head -10
# Process status summary
cat /proc/$$/status | head -20
The status file is especially useful:
cat /proc/$$/status
Key fields:
Name:-- process nameState:-- current statePid:/PPid:-- PID and parent PIDUid:/Gid:-- real, effective, saved, filesystem UIDsVmRSS:-- resident memory sizeThreads:-- number of threadsvoluntary_ctxt_switches:-- context switches
Think About It: The
/proc/$$/fd/directory shows all open file descriptors. File descriptor 0 is stdin, 1 is stdout, 2 is stderr. What do the symlinks point to when you are working in a terminal? What would they point to for a daemon process?
Foreground and Background Jobs
Running Commands in the Background
Add & to run a command in the background:
# Start a long-running command in the background
sleep 300 &
# Output: [1] 12345
# [1] is the job number, 12345 is the PID
Managing Jobs
# List all background jobs in this shell
jobs
# Output: [1]+ Running sleep 300 &
# Bring a background job to the foreground
fg %1
# Suspend a foreground job (Ctrl+Z)
# Then see it in jobs list
jobs
# Output: [1]+ Stopped sleep 300
# Resume it in the background
bg %1
# Resume it in the foreground
fg %1
The Job Control Workflow
Ctrl+Z
FOREGROUND ──────────────> STOPPED
^ |
| |
fg %N bg %N
| |
| v
+──────── BACKGROUND <─────+
(& at start)
Practical Example
# Start a large file copy in the foreground
cp -r /data/bigdir /backup/bigdir
# Oh wait, this is taking forever. Suspend it:
# Press Ctrl+Z
# Output: [1]+ Stopped cp -r /data/bigdir /backup/bigdir
# Resume it in the background so you can do other work
bg %1
# Output: [1]+ cp -r /data/bigdir /backup/bigdir &
# Now you can keep using the terminal
# Check on the job periodically
jobs
Keeping Processes Alive: nohup and disown
When you log out of a terminal, the shell sends SIGHUP to all its child processes. This kills them. That is fine for interactive commands, but terrible for long-running tasks.
nohup -- Plan Ahead
If you know beforehand that a command should survive logout:
# nohup redirects output to nohup.out and ignores SIGHUP
nohup python3 long_training.py &
# Or redirect output yourself
nohup ./backup_script.sh > /var/log/backup.log 2>&1 &
disown -- After the Fact
You already started a job and forgot nohup? Use disown:
# Start a long job
python3 train_model.py &
# Output: [1] 34567
# Remove it from the shell's job table
disown %1
# Now you can log out safely -- the process will keep running
# disown all background jobs
disown -a
# disown and also suppress SIGHUP
disown -h %1
Which to Use?
+-------------------------------------------------------------+
| Situation | Use |
|-----------------------------------|--------------------------|
| Starting a new long-running job | nohup command & |
| Already running, forgot nohup | Ctrl+Z, bg, disown |
| Need to reconnect to output | Use tmux or screen |
+-------------------------------------------------------------+
Think About It: Neither
nohupnordisownlets you reconnect to the process output later. What tool from Chapter 26 (tmux) solves this problem? Why is tmux or screen often the better choice from the start?
Process Priority: nice and renice
Linux uses a priority system to decide how much CPU time each process gets. The "nice" value ranges from -20 (highest priority, least nice to other processes) to 19 (lowest priority, most nice):
-20 ─────────── 0 ─────────── 19
Highest priority Default Lowest priority
(least nice) (most nice)
Setting Priority at Launch
# Run a CPU-heavy command at low priority
nice -n 10 make -j$(nproc)
# Run at high priority (requires root)
sudo nice -n -5 ./critical-task
Changing Priority of Running Processes
# Find the PID
pgrep -a my_script
# Lower its priority (any user can increase niceness)
renice 15 -p 12345
# Raise its priority (requires root)
sudo renice -10 -p 12345
# Renice all processes by a user
sudo renice 10 -u testuser
Practical Example
# You need to compile a project but don't want it to slow down
# the web server running on the same machine
nice -n 19 make -j$(nproc)
# The backup is running too fast and choking disk I/O
# Lower its priority
renice 15 -p $(pgrep rsync)
Killing Processes
The kill Command
Despite its name, kill actually sends signals. We will cover signals in depth in Chapter 11, but here are the essentials:
# Send SIGTERM (graceful termination) -- this is the default
kill 12345
# Send SIGKILL (forceful termination -- cannot be caught)
kill -9 12345
# Or equivalently:
kill -KILL 12345
# Send SIGTERM to all processes with a given name
killall nginx
# Same, with pattern matching
pkill -f "python.*train"
# Kill all processes owned by a user
pkill -u testuser
WARNING:
kill -9should be your LAST resort. It does not give the process a chance to clean up -- temporary files may be left behind, database connections may not be closed properly, data may be corrupted. Always trykill(SIGTERM) first, wait a few seconds, and only usekill -9if the process refuses to die.
The Escalation Ladder
# Step 1: Ask nicely (SIGTERM)
kill 12345
# Step 2: Wait a moment
sleep 5
# Step 3: Check if it is still running
ps -p 12345
# Step 4: Only if still running, force kill
kill -9 12345
Debug This: Diagnosing a High-CPU Process
Your monitoring says the system load is 12 on a 4-core machine. Diagnose it:
# Step 1: See system load averages
uptime
# Output: load average: 12.05, 11.30, 9.45
# Step 2: What is consuming CPU?
ps aux --sort=-%cpu | head -10
# Step 3: Is it a single process or many?
# If one process shows 400% CPU, it is multi-threaded
# If many processes show 100%, you have too many competing
# Step 4: Investigate the top consumer
# Say PID 5678 is at 350% CPU
cat /proc/5678/status | grep -E "Name|State|Threads|Pid|PPid"
# Step 5: What files does it have open?
ls -l /proc/5678/fd/ | head -20
# Step 6: What is it actually doing? (trace its system calls)
sudo strace -p 5678 -c
# Press Ctrl+C after a few seconds to see a summary
# Step 7: What started it? Check full command line
cat /proc/5678/cmdline | tr '\0' ' '; echo
# Step 8: If it is runaway garbage, kill it gracefully
kill 5678
sleep 3
# If still alive:
kill -9 5678
Hands-On: Process Exploration Lab
# 1. Create some processes to observe
sleep 600 &
sleep 601 &
sleep 602 &
# 2. See them in your job list
jobs -l
# 3. See them in ps
ps aux | grep sleep
# 4. Look at their parent PID -- it should be your shell
echo "My shell PID: $$"
ps -p $(pgrep -f "sleep 60[0-2]") -o pid,ppid,stat,cmd
# 5. Suspend one
kill -STOP $(pgrep -f "sleep 600")
# Check its state changed to T
ps -p $(pgrep -f "sleep 600") -o pid,stat,cmd
# 6. Resume it
kill -CONT $(pgrep -f "sleep 600")
# Check its state is back to S
ps -p $(pgrep -f "sleep 600") -o pid,stat,cmd
# 7. Create a "zombie" (for educational purposes)
# This script forks a child that exits, but parent doesn't wait()
bash -c '(exit 0) & sleep 2; ps aux | grep -E "Z|defunct"' &
# 8. Clean up
kill $(pgrep -f "sleep 60[0-2]") 2>/dev/null
What Just Happened?
+------------------------------------------------------------------+
| Chapter 10 Recap: Processes & Job Control |
|------------------------------------------------------------------|
| |
| - A process is a running instance of a program. |
| - Every process has a PID and a parent (PPID). |
| - Processes are created via fork()+exec(). |
| - PID 1 (init/systemd) is the ancestor of all processes. |
| - ps aux and ps -ef show process listings. |
| - Process states: R (running), S (sleeping), D (disk wait), |
| Z (zombie), T (stopped). |
| - Use & to background, Ctrl+Z to suspend, bg/fg to resume. |
| - nohup and disown keep processes alive after logout. |
| - nice/renice control scheduling priority (-20 to 19). |
| - kill sends signals; always try SIGTERM before SIGKILL. |
| - /proc/PID/ is a goldmine of per-process information. |
| - top/htop provide real-time monitoring. |
| |
+------------------------------------------------------------------+
Try This
Exercise 1: Process Genealogy
Run pstree -p $$ to see the process tree from your shell to PID 1. Count how many generations separate your shell from PID 1. What are the intermediate processes?
Exercise 2: Resource Hog
Write a command that consumes CPU (e.g., yes > /dev/null &). Launch three of them with different nice values (0, 10, 19). Use top to observe how CPU time is distributed. Which one gets the most? Kill them all when done.
Exercise 3: Job Control Mastery
Start five sleep 1000 commands in the background. Use jobs to list them. Bring the third one to the foreground, suspend it with Ctrl+Z, then resume it in the background. Kill the second one by job number (kill %2). Verify with jobs.
Exercise 4: /proc Deep Dive
Pick any running process and explore its /proc/PID/ directory. Read status, cmdline, environ, maps, and fd/. Write a one-paragraph summary of what this process is doing, based solely on what you found in /proc.
Bonus Challenge
Write a bash script called proc-report.sh that takes a PID as an argument and outputs: the command name, its parent's command name, its state, memory usage (RSS), number of open file descriptors, and how long it has been running. All information should come from /proc/PID/ and related files.