Memory Management
Why This Matters
Your monitoring dashboard shows the server using 95% of its 16 GB RAM. Should you panic? Maybe not. Linux aggressively uses free memory for disk caching. That 95% might include 8 GB of cache that can be reclaimed instantly when applications need it. But if applications are genuinely consuming all the RAM, the kernel's OOM (Out of Memory) killer will start terminating processes -- and it might choose your database.
Understanding Linux memory management is the difference between a calm "that is just cache" and a panicked "we need to add RAM immediately." It is the difference between knowing why your Java application was killed at 3 AM and being baffled by a mystery crash.
This chapter explains how Linux manages memory, how to read memory statistics correctly, how swap and the OOM killer work, and how to control memory usage with cgroups.
Try This Right Now
# See your memory usage
$ free -h
total used free shared buff/cache available
Mem: 16Gi 4.5Gi 3.2Gi 256Mi 8.3Gi 11Gi
Swap: 4.0Gi 0B 4.0Gi
# What does the kernel think?
$ cat /proc/meminfo | head -10
# Which processes use the most memory?
$ ps aux --sort=-%mem | head -10
# Current swap usage
$ swapon --show
# OOM score of a running process (pick any PID)
$ cat /proc/1/oom_score
Virtual Memory Recap
Every process in Linux believes it has its own private, contiguous block of memory. This is virtual memory -- an illusion maintained by the kernel and the CPU's Memory Management Unit (MMU).
┌──────────────────────────────────────────────────────────┐
│ VIRTUAL MEMORY │
│ │
│ Process A sees: Process B sees: │
│ ┌──────────────┐ ┌──────────────┐ │
│ │ 0x0000 Code │ │ 0x0000 Code │ │
│ │ 0x1000 Data │ │ 0x1000 Data │ │
│ │ 0x2000 Heap │ │ 0x2000 Heap │ │
│ │ ... │ │ ... │ │
│ │ 0xFFFF Stack │ │ 0xFFFF Stack │ │
│ └──────┬───────┘ └──────┬───────┘ │
│ │ │ │
│ │ Page Table │ Page Table │
│ ▼ Mapping ▼ Mapping │
│ ┌──────────────────────────────────────────┐ │
│ │ PHYSICAL RAM │ │
│ │ ┌────┬────┬────┬────┬────┬────┬────┐ │ │
│ │ │ A │ B │ A │ K │ B │ A │ B │ │ │
│ │ └────┴────┴────┴────┴────┴────┴────┘ │ │
│ │ Pages scattered across physical RAM │ │
│ └──────────────────────────────────────────┘ │
│ │
│ When physical RAM is full, the kernel swaps │
│ inactive pages to disk (swap space). │
└──────────────────────────────────────────────────────────┘
Key concepts:
- Memory is divided into pages (typically 4 KB each)
- The kernel maps virtual pages to physical RAM frames
- Processes can allocate more virtual memory than physical RAM exists
- When RAM is full, inactive pages are moved to swap on disk
The free Command: Reading It Correctly
$ free -h
total used free shared buff/cache available
Mem: 16Gi 4.5Gi 3.2Gi 256Mi 8.3Gi 11Gi
Swap: 4.0Gi 0B 4.0Gi
What Each Column Means
| Column | Meaning |
|---|---|
total | Total physical RAM installed |
used | RAM used by processes AND kernel |
free | RAM not being used for anything at all |
shared | Memory used by tmpfs (shared memory) |
buff/cache | Memory used for disk buffers and file cache |
available | Memory available for new processes (free + reclaimable cache) |
The Critical Insight: free vs available
Do not look at free. Look at available.
The free column shows memory that is completely unused. But Linux uses "free" memory for file caching -- keeping recently read file data in RAM so that future reads are fast. This cache is instantly reclaimable when applications need memory.
Example:
total = 16 GB
used = 4.5 GB (applications)
free = 3.2 GB (truly idle)
cache = 8.3 GB (file cache, reclaimable)
available = 11 GB (free + reclaimable cache)
Is this server out of memory? NO!
It has 11 GB available for applications.
The 8.3 GB of cache is HELPING performance, not wasting RAM.
Think About It: A new administrator sees
freeshowing 200 MB on a 64 GB server and wants to add more RAM. What would you tell them? What number should they actually check?
Buffers vs Cache
Both are forms of disk caching, but they serve different purposes:
$ cat /proc/meminfo | grep -E "^(Buffers|Cached|SReclaimable)"
Buffers: 123456 kB
Cached: 3504576 kB
SReclaimable: 245760 kB
Buffers: Cache for raw block device I/O (disk metadata, superblocks, directory entries). Small in size.
Cached: Cache for file content. When you read a file, its contents are kept in the page cache. This is usually the large portion.
SReclaimable: Slab memory that can be reclaimed (kernel data structures like inode cache, dentry cache).
# Watch cache fill up as you read files
$ free -h # note the buff/cache
$ sudo find /usr -type f -exec cat {} \; > /dev/null 2>&1
$ free -h # buff/cache should have grown
# Drop caches (for testing only, not for production!)
$ sudo sh -c 'echo 3 > /proc/sys/vm/drop_caches'
$ free -h # cache dropped, free increased
WARNING: Dropping caches in production causes performance degradation as files must be re-read from disk. Only do this for diagnostic purposes.
Swap: The Overflow Lane
Swap is disk space used as an extension of RAM. When physical memory is full, the kernel moves inactive pages to swap, freeing RAM for active processes.
# View current swap
$ swapon --show
NAME TYPE SIZE USED PRIO
/dev/sda3 partition 4G 0B -2
# Or from /proc
$ cat /proc/swaps
Filename Type Size Used Priority
/dev/sda3 partition 4194300 0 -2
# View swap in free output
$ free -h | grep Swap
Swap: 4.0Gi 0B 4.0Gi
Creating Swap Space
# Create a swap file (2 GB)
$ sudo fallocate -l 2G /swapfile
$ sudo chmod 600 /swapfile
$ sudo mkswap /swapfile
$ sudo swapon /swapfile
# Verify
$ swapon --show
# Make it permanent
$ echo '/swapfile none swap sw 0 0' | sudo tee -a /etc/fstab
# Remove swap
$ sudo swapoff /swapfile
$ sudo rm /swapfile
# (and remove the fstab entry)
Swappiness: How Aggressively to Swap
The swappiness parameter controls how aggressively the kernel moves pages to swap:
# View current swappiness (default is usually 60)
$ cat /proc/sys/vm/swappiness
60
# Temporarily change it
$ sudo sysctl vm.swappiness=10
# Make it permanent
$ echo 'vm.swappiness=10' | sudo tee -a /etc/sysctl.d/99-swappiness.conf
$ sudo sysctl --system
| Swappiness | Behavior |
|---|---|
| 0 | Avoid swap unless absolutely necessary (kernel may still swap) |
| 10 | Swap only when under heavy memory pressure (good for databases) |
| 60 | Default -- balanced behavior |
| 100 | Aggressively swap (favors keeping cache over anonymous pages) |
For database servers and latency-sensitive applications, a low swappiness (10-20) is common because swapping causes latency spikes. For general-purpose servers, the default 60 is usually fine.
Think About It: Why might you NOT want to set swappiness to 0 on a database server? What happens if a memory leak slowly consumes all RAM and there is no swap?
The OOM Killer
When the system runs out of both RAM and swap, the kernel invokes the OOM (Out of Memory) Killer. Its job is to kill processes to free memory and keep the system alive.
How the OOM Killer Chooses Victims
Every process has an OOM score from 0 to 1000. The process with the highest score gets killed first.
# View OOM score of a process
$ cat /proc/1234/oom_score
15
# View OOM score adjustment
$ cat /proc/1234/oom_score_adj
0
The OOM score is calculated based on:
- How much memory the process is using (more memory = higher score)
- How long the process has been running (shorter = higher score)
- The
oom_score_adjadjustment (-1000 to +1000)
Protecting Critical Processes from OOM
# Make a process immune to the OOM killer
$ echo -1000 | sudo tee /proc/1234/oom_score_adj
# Make a process more likely to be killed
$ echo 500 | sudo tee /proc/5678/oom_score_adj
Common strategy:
| Process | oom_score_adj | Rationale |
|---|---|---|
| sshd | -1000 | Never kill SSH -- you need it to fix things |
| database | -500 | Protect critical data |
| web server | 0 | Default priority |
| batch jobs | 500 | Kill these first |
Setting OOM Protection in systemd Services
# In a systemd service file
[Service]
OOMScoreAdjust=-1000
Detecting OOM Kills
# Check kernel logs for OOM events
$ dmesg | grep -i "oom"
[12345.678901] Out of memory: Killed process 5678 (java) total-vm:8388608kB, anon-rss:6291456kB
# Or in journald
$ journalctl -k | grep -i oom
# Check if a specific process was OOM-killed
$ journalctl -k | grep "Killed process"
OOM Anatomy: What an OOM Kill Looks Like
Jan 18 14:30:00 server kernel: java invoked oom-killer: gfp_mask=0x6200ca(GFP_HIGHUSER_MOVABLE)
Jan 18 14:30:00 server kernel: Mem-Info:
Jan 18 14:30:00 server kernel: active_anon:4012345 inactive_anon:23456 ...
Jan 18 14:30:00 server kernel: Free swap = 0kB
Jan 18 14:30:00 server kernel: Total swap = 4194300kB
Jan 18 14:30:00 server kernel: Out of memory: Killed process 5678 (java)
total-vm:8388608kB, anon-rss:6291456kB, file-rss:12345kB
This tells you:
javatriggered the OOM killer- Swap is 100% full (
Free swap = 0kB) - The
javaprocess was using ~6 GB of RSS (resident set size)
/proc/meminfo Deep Dive
$ cat /proc/meminfo
Key fields explained:
| Field | Meaning |
|---|---|
MemTotal | Total usable RAM (slightly less than physical due to kernel reservations) |
MemFree | Completely unused RAM |
MemAvailable | Estimated memory available for applications (includes reclaimable cache) |
Buffers | Raw disk block cache |
Cached | File content cache (page cache) |
SwapCached | Swap data also in RAM (avoids re-reading from swap) |
Active | Recently accessed memory (less likely to be reclaimed) |
Inactive | Not recently accessed (candidate for reclamation or swap) |
Active(anon) | Anonymous pages (application heap/stack) recently used |
Inactive(anon) | Anonymous pages not recently used (swap candidates) |
Active(file) | File-backed pages recently used |
Inactive(file) | File-backed pages (reclaimable without swap) |
Dirty | Pages modified in memory but not yet written to disk |
Slab | Kernel data structure cache |
SReclaimable | Slab memory that can be freed |
SUnreclaim | Slab memory that cannot be freed |
Committed_AS | Total memory committed (promised) to processes |
VmallocTotal | Total vmalloc address space |
HugePages_Total | Number of hugepages allocated |
# Quick check: is the system under memory pressure?
$ awk '/MemAvailable/ {avail=$2} /MemTotal/ {total=$2} END {printf "%.1f%% available\n", avail/total*100}' /proc/meminfo
68.3% available
Finding Memory-Hungry Processes
Using ps
# Top 10 memory consumers by RSS (Resident Set Size)
$ ps aux --sort=-%mem | head -11
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
mysql 1234 2.3 12.5 2415648 2048000 ? Ssl Jan15 48:23 /usr/sbin/mysqld
www-data 5678 0.8 5.2 892416 851968 ? S Jan15 12:45 /usr/sbin/apache2
redis 9012 0.1 3.1 234567 507904 ? Ssl Jan15 2:34 /usr/bin/redis-server
# VSZ = Virtual Size (address space, often much larger than actual usage)
# RSS = Resident Set Size (actual physical RAM used)
# %MEM = RSS as a percentage of total RAM
Using smem (More Accurate)
# Install smem
$ sudo apt install smem
# smem accounts for shared memory properly
$ sudo smem -rkt -s pss | head -10
PID User Command Swap USS PSS RSS
1234 mysql /usr/sbin/mysqld 0 1.90G 1.92G 1.95G
5678 www-data /usr/sbin/apache2 0 78.5M 120.3M 234.5M
USS (Unique Set Size): Memory uniquely owned by this process. PSS (Proportional Set Size): USS + proportional share of shared memory. RSS (Resident Set Size): All physical memory used, including shared.
PSS is the most accurate measure of a process's true memory footprint.
Using /proc/PID/status
$ cat /proc/1234/status | grep -E "^(Name|VmRSS|VmSize|VmSwap|RssAnon|RssFile)"
Name: mysqld
VmSize: 2415648 kB # Virtual memory size
VmRSS: 2048000 kB # Physical memory in use
VmSwap: 0 kB # Memory swapped to disk
RssAnon: 1900000 kB # Anonymous (heap/stack) memory
RssFile: 148000 kB # File-backed memory (mmap'd files)
Memory Leak Detection Basics
A memory leak occurs when a process continuously allocates memory without freeing it.
Spotting a Leak
# Watch a process's memory over time
$ while true; do
ps -p 1234 -o pid,rss,vsz,comm --no-headers
sleep 60
done | tee /tmp/mem-watch.log
# If RSS grows continuously without leveling off, it is likely a leak.
# Or use pidstat (from sysstat)
$ pidstat -r -p 1234 60
14:30:00 PID minflt/s majflt/s VSZ RSS %MEM Command
14:31:00 1234 45.23 0.00 2415648 2048000 12.5 mysqld
14:32:00 1234 52.17 0.00 2415648 2052096 12.5 mysqld
14:33:00 1234 48.90 0.00 2419744 2060288 12.6 mysqld ← growing
Using valgrind (For Development)
# Run a program under valgrind to detect leaks
$ valgrind --leak-check=full --show-leak-kinds=all ./my_application
==12345== LEAK SUMMARY:
==12345== definitely lost: 1,234 bytes in 5 blocks
==12345== indirectly lost: 5,678 bytes in 12 blocks
Cgroups Memory Limits
Control groups (cgroups) allow you to limit memory usage for a process or group of processes. This prevents a single application from consuming all system memory.
Using systemd (cgroups v2)
# Limit a service to 512 MB of memory
$ sudo systemctl set-property myapp.service MemoryMax=512M
# Or in the service file
$ sudo vim /etc/systemd/system/myapp.service
[Service]
MemoryMax=512M
MemorySwapMax=0 # No swap allowed
MemoryHigh=400M # Start throttling at 400M
# Check current memory usage of a service
$ systemctl status myapp.service
Memory: 234.5M (max: 512.0M available: 277.5M)
# Or via cgroupfs
$ cat /sys/fs/cgroup/system.slice/myapp.service/memory.current
245891072
$ cat /sys/fs/cgroup/system.slice/myapp.service/memory.max
536870912
Manual cgroups v2
# Create a cgroup
$ sudo mkdir /sys/fs/cgroup/mygroup
# Set memory limit (256 MB)
$ echo 268435456 | sudo tee /sys/fs/cgroup/mygroup/memory.max
# Add a process to the cgroup
$ echo 1234 | sudo tee /sys/fs/cgroup/mygroup/cgroup.procs
# Check usage
$ cat /sys/fs/cgroup/mygroup/memory.current
Hands-On: Memory Pressure Simulation
# Install stress-ng for memory testing
$ sudo apt install stress-ng
# Allocate 2 GB of memory for 30 seconds
$ stress-ng --vm 1 --vm-bytes 2G --timeout 30s &
# In another terminal, watch the memory impact
$ watch -n1 free -h
# Watch the OOM score change
$ watch -n1 "cat /proc/$(pgrep -f stress-ng | head -1)/oom_score"
Think About It: If you run
stress-ng --vm 1 --vm-bytes 20Gon a system with 16 GB of RAM and 4 GB of swap, what happens? Which kernel subsystem intervenes?
Debug This
A server has 32 GB of RAM. An application team says the server is "out of memory" and requests a RAM upgrade. Here is what you see:
$ free -h
total used free shared buff/cache available
Mem: 32Gi 28Gi 512Mi 128Mi 3.5Gi 3.8Gi
Swap: 8.0Gi 2.1Gi 5.9Gi
Questions to ask:
- Is the server truly out of memory?
availableshows 3.8 GB. There is still memory available. - But swap is being used (2.1 GB). This means the system has been under memory pressure at some point. Active swapping causes performance issues.
- What is using the memory?
$ ps aux --sort=-%mem | head -5
USER PID %CPU %MEM VSZ RSS COMMAND
java 4567 5.2 62.5 24G 20G java -Xmx20G ...
The Java application has a 20 GB heap configured. On a 32 GB system, that leaves only 12 GB for the OS, cache, and all other processes. The real fix is not more RAM -- it is right-sizing the Java heap.
┌──────────────────────────────────────────────────────────┐
│ What Just Happened? │
├──────────────────────────────────────────────────────────┤
│ │
│ Linux memory management key points: │
│ │
│ Reading free: │
│ - Ignore "free" -- look at "available" │
│ - buff/cache is reclaimable, not wasted │
│ - Swap usage indicates past memory pressure │
│ │
│ Swap: │
│ - Overflow for when RAM is full │
│ - swappiness controls how aggressively pages are swapped │
│ - Set low (10-20) for latency-sensitive workloads │
│ │
│ OOM Killer: │
│ - Last resort when RAM + swap are exhausted │
│ - Kills the highest oom_score process │
│ - Protect critical services with oom_score_adj=-1000 │
│ │
│ Controlling memory: │
│ - cgroups/systemd MemoryMax to limit per-service usage │
│ - Find leaks with pidstat/valgrind │
│ - Monitor with /proc/meminfo and ps/smem │
│ │
└──────────────────────────────────────────────────────────┘
Try This
-
Read free correctly: Run
free -hon your system. Calculate what percentage of total RAM is actually available. Is the system under memory pressure? -
Watch caching: Clear the page cache (
echo 3 > /proc/sys/vm/drop_caches), note thefreeandbuff/cachevalues, then read a large file. Watchbuff/cachegrow andfreeshrink. Verify thatavailablebarely changes. -
Swappiness experiment: Check your current swappiness. Create a swap file, change swappiness to 100, and run a memory stress test. Then change swappiness to 10 and repeat. Compare how quickly swap is used.
-
OOM exploration: Find the OOM scores of all your running processes. Which process would the OOM killer target first? Set
oom_score_adj=-1000on your SSH daemon to protect it. -
Bonus challenge: Create a systemd service with
MemoryMax=100Mthat runs a script which tries to allocate 200 MB. Watch the OOM killer terminate it. Checkjournalctl -kfor the OOM event.