Memory Management

Why This Matters

Your monitoring dashboard shows the server using 95% of its 16 GB RAM. Should you panic? Maybe not. Linux aggressively uses free memory for disk caching. That 95% might include 8 GB of cache that can be reclaimed instantly when applications need it. But if applications are genuinely consuming all the RAM, the kernel's OOM (Out of Memory) killer will start terminating processes -- and it might choose your database.

Understanding Linux memory management is the difference between a calm "that is just cache" and a panicked "we need to add RAM immediately." It is the difference between knowing why your Java application was killed at 3 AM and being baffled by a mystery crash.

This chapter explains how Linux manages memory, how to read memory statistics correctly, how swap and the OOM killer work, and how to control memory usage with cgroups.


Try This Right Now

# See your memory usage
$ free -h
               total        used        free      shared  buff/cache   available
Mem:            16Gi       4.5Gi       3.2Gi       256Mi       8.3Gi        11Gi
Swap:          4.0Gi          0B       4.0Gi

# What does the kernel think?
$ cat /proc/meminfo | head -10

# Which processes use the most memory?
$ ps aux --sort=-%mem | head -10

# Current swap usage
$ swapon --show

# OOM score of a running process (pick any PID)
$ cat /proc/1/oom_score

Virtual Memory Recap

Every process in Linux believes it has its own private, contiguous block of memory. This is virtual memory -- an illusion maintained by the kernel and the CPU's Memory Management Unit (MMU).

┌──────────────────────────────────────────────────────────┐
│                 VIRTUAL MEMORY                            │
│                                                           │
│  Process A sees:          Process B sees:                 │
│  ┌──────────────┐         ┌──────────────┐               │
│  │ 0x0000 Code  │         │ 0x0000 Code  │               │
│  │ 0x1000 Data  │         │ 0x1000 Data  │               │
│  │ 0x2000 Heap  │         │ 0x2000 Heap  │               │
│  │   ...        │         │   ...        │               │
│  │ 0xFFFF Stack │         │ 0xFFFF Stack │               │
│  └──────┬───────┘         └──────┬───────┘               │
│         │                        │                        │
│         │    Page Table          │    Page Table           │
│         ▼    Mapping             ▼    Mapping              │
│  ┌──────────────────────────────────────────┐             │
│  │          PHYSICAL RAM                     │             │
│  │  ┌────┬────┬────┬────┬────┬────┬────┐    │             │
│  │  │ A  │ B  │ A  │ K  │ B  │ A  │ B  │    │             │
│  │  └────┴────┴────┴────┴────┴────┴────┘    │             │
│  │     Pages scattered across physical RAM   │             │
│  └──────────────────────────────────────────┘             │
│                                                           │
│  When physical RAM is full, the kernel swaps              │
│  inactive pages to disk (swap space).                     │
└──────────────────────────────────────────────────────────┘

Key concepts:

  • Memory is divided into pages (typically 4 KB each)
  • The kernel maps virtual pages to physical RAM frames
  • Processes can allocate more virtual memory than physical RAM exists
  • When RAM is full, inactive pages are moved to swap on disk

The free Command: Reading It Correctly

$ free -h
               total        used        free      shared  buff/cache   available
Mem:            16Gi       4.5Gi       3.2Gi       256Mi       8.3Gi        11Gi
Swap:          4.0Gi          0B       4.0Gi

What Each Column Means

ColumnMeaning
totalTotal physical RAM installed
usedRAM used by processes AND kernel
freeRAM not being used for anything at all
sharedMemory used by tmpfs (shared memory)
buff/cacheMemory used for disk buffers and file cache
availableMemory available for new processes (free + reclaimable cache)

The Critical Insight: free vs available

Do not look at free. Look at available.

The free column shows memory that is completely unused. But Linux uses "free" memory for file caching -- keeping recently read file data in RAM so that future reads are fast. This cache is instantly reclaimable when applications need memory.

Example:
  total = 16 GB
  used  =  4.5 GB   (applications)
  free  =  3.2 GB   (truly idle)
  cache =  8.3 GB   (file cache, reclaimable)
  available = 11 GB  (free + reclaimable cache)

Is this server out of memory? NO!
It has 11 GB available for applications.
The 8.3 GB of cache is HELPING performance, not wasting RAM.

Think About It: A new administrator sees free showing 200 MB on a 64 GB server and wants to add more RAM. What would you tell them? What number should they actually check?


Buffers vs Cache

Both are forms of disk caching, but they serve different purposes:

$ cat /proc/meminfo | grep -E "^(Buffers|Cached|SReclaimable)"
Buffers:          123456 kB
Cached:          3504576 kB
SReclaimable:     245760 kB

Buffers: Cache for raw block device I/O (disk metadata, superblocks, directory entries). Small in size.

Cached: Cache for file content. When you read a file, its contents are kept in the page cache. This is usually the large portion.

SReclaimable: Slab memory that can be reclaimed (kernel data structures like inode cache, dentry cache).

# Watch cache fill up as you read files
$ free -h    # note the buff/cache
$ sudo find /usr -type f -exec cat {} \; > /dev/null 2>&1
$ free -h    # buff/cache should have grown

# Drop caches (for testing only, not for production!)
$ sudo sh -c 'echo 3 > /proc/sys/vm/drop_caches'
$ free -h    # cache dropped, free increased

WARNING: Dropping caches in production causes performance degradation as files must be re-read from disk. Only do this for diagnostic purposes.


Swap: The Overflow Lane

Swap is disk space used as an extension of RAM. When physical memory is full, the kernel moves inactive pages to swap, freeing RAM for active processes.

# View current swap
$ swapon --show
NAME      TYPE      SIZE   USED PRIO
/dev/sda3 partition 4G     0B   -2

# Or from /proc
$ cat /proc/swaps
Filename              Type        Size       Used    Priority
/dev/sda3             partition   4194300    0       -2

# View swap in free output
$ free -h | grep Swap
Swap:          4.0Gi          0B       4.0Gi

Creating Swap Space

# Create a swap file (2 GB)
$ sudo fallocate -l 2G /swapfile
$ sudo chmod 600 /swapfile
$ sudo mkswap /swapfile
$ sudo swapon /swapfile

# Verify
$ swapon --show

# Make it permanent
$ echo '/swapfile none swap sw 0 0' | sudo tee -a /etc/fstab

# Remove swap
$ sudo swapoff /swapfile
$ sudo rm /swapfile
# (and remove the fstab entry)

Swappiness: How Aggressively to Swap

The swappiness parameter controls how aggressively the kernel moves pages to swap:

# View current swappiness (default is usually 60)
$ cat /proc/sys/vm/swappiness
60

# Temporarily change it
$ sudo sysctl vm.swappiness=10

# Make it permanent
$ echo 'vm.swappiness=10' | sudo tee -a /etc/sysctl.d/99-swappiness.conf
$ sudo sysctl --system
SwappinessBehavior
0Avoid swap unless absolutely necessary (kernel may still swap)
10Swap only when under heavy memory pressure (good for databases)
60Default -- balanced behavior
100Aggressively swap (favors keeping cache over anonymous pages)

For database servers and latency-sensitive applications, a low swappiness (10-20) is common because swapping causes latency spikes. For general-purpose servers, the default 60 is usually fine.

Think About It: Why might you NOT want to set swappiness to 0 on a database server? What happens if a memory leak slowly consumes all RAM and there is no swap?


The OOM Killer

When the system runs out of both RAM and swap, the kernel invokes the OOM (Out of Memory) Killer. Its job is to kill processes to free memory and keep the system alive.

How the OOM Killer Chooses Victims

Every process has an OOM score from 0 to 1000. The process with the highest score gets killed first.

# View OOM score of a process
$ cat /proc/1234/oom_score
15

# View OOM score adjustment
$ cat /proc/1234/oom_score_adj
0

The OOM score is calculated based on:

  • How much memory the process is using (more memory = higher score)
  • How long the process has been running (shorter = higher score)
  • The oom_score_adj adjustment (-1000 to +1000)

Protecting Critical Processes from OOM

# Make a process immune to the OOM killer
$ echo -1000 | sudo tee /proc/1234/oom_score_adj

# Make a process more likely to be killed
$ echo 500 | sudo tee /proc/5678/oom_score_adj

Common strategy:

Processoom_score_adjRationale
sshd-1000Never kill SSH -- you need it to fix things
database-500Protect critical data
web server0Default priority
batch jobs500Kill these first

Setting OOM Protection in systemd Services

# In a systemd service file
[Service]
OOMScoreAdjust=-1000

Detecting OOM Kills

# Check kernel logs for OOM events
$ dmesg | grep -i "oom"
[12345.678901] Out of memory: Killed process 5678 (java) total-vm:8388608kB, anon-rss:6291456kB

# Or in journald
$ journalctl -k | grep -i oom

# Check if a specific process was OOM-killed
$ journalctl -k | grep "Killed process"

OOM Anatomy: What an OOM Kill Looks Like

Jan 18 14:30:00 server kernel: java invoked oom-killer: gfp_mask=0x6200ca(GFP_HIGHUSER_MOVABLE)
Jan 18 14:30:00 server kernel: Mem-Info:
Jan 18 14:30:00 server kernel: active_anon:4012345 inactive_anon:23456 ...
Jan 18 14:30:00 server kernel: Free swap  = 0kB
Jan 18 14:30:00 server kernel: Total swap = 4194300kB
Jan 18 14:30:00 server kernel: Out of memory: Killed process 5678 (java)
                                total-vm:8388608kB, anon-rss:6291456kB, file-rss:12345kB

This tells you:

  • java triggered the OOM killer
  • Swap is 100% full (Free swap = 0kB)
  • The java process was using ~6 GB of RSS (resident set size)

/proc/meminfo Deep Dive

$ cat /proc/meminfo

Key fields explained:

FieldMeaning
MemTotalTotal usable RAM (slightly less than physical due to kernel reservations)
MemFreeCompletely unused RAM
MemAvailableEstimated memory available for applications (includes reclaimable cache)
BuffersRaw disk block cache
CachedFile content cache (page cache)
SwapCachedSwap data also in RAM (avoids re-reading from swap)
ActiveRecently accessed memory (less likely to be reclaimed)
InactiveNot recently accessed (candidate for reclamation or swap)
Active(anon)Anonymous pages (application heap/stack) recently used
Inactive(anon)Anonymous pages not recently used (swap candidates)
Active(file)File-backed pages recently used
Inactive(file)File-backed pages (reclaimable without swap)
DirtyPages modified in memory but not yet written to disk
SlabKernel data structure cache
SReclaimableSlab memory that can be freed
SUnreclaimSlab memory that cannot be freed
Committed_ASTotal memory committed (promised) to processes
VmallocTotalTotal vmalloc address space
HugePages_TotalNumber of hugepages allocated
# Quick check: is the system under memory pressure?
$ awk '/MemAvailable/ {avail=$2} /MemTotal/ {total=$2} END {printf "%.1f%% available\n", avail/total*100}' /proc/meminfo
68.3% available

Finding Memory-Hungry Processes

Using ps

# Top 10 memory consumers by RSS (Resident Set Size)
$ ps aux --sort=-%mem | head -11
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
mysql     1234  2.3  12.5 2415648 2048000 ?     Ssl  Jan15  48:23 /usr/sbin/mysqld
www-data  5678  0.8   5.2  892416  851968 ?     S    Jan15  12:45 /usr/sbin/apache2
redis     9012  0.1   3.1  234567  507904 ?     Ssl  Jan15   2:34 /usr/bin/redis-server

# VSZ = Virtual Size (address space, often much larger than actual usage)
# RSS = Resident Set Size (actual physical RAM used)
# %MEM = RSS as a percentage of total RAM

Using smem (More Accurate)

# Install smem
$ sudo apt install smem

# smem accounts for shared memory properly
$ sudo smem -rkt -s pss | head -10
  PID User     Command                         Swap      USS      PSS      RSS
 1234 mysql    /usr/sbin/mysqld                   0   1.90G    1.92G    1.95G
 5678 www-data /usr/sbin/apache2                  0  78.5M   120.3M   234.5M

USS (Unique Set Size): Memory uniquely owned by this process. PSS (Proportional Set Size): USS + proportional share of shared memory. RSS (Resident Set Size): All physical memory used, including shared.

PSS is the most accurate measure of a process's true memory footprint.

Using /proc/PID/status

$ cat /proc/1234/status | grep -E "^(Name|VmRSS|VmSize|VmSwap|RssAnon|RssFile)"
Name:   mysqld
VmSize:  2415648 kB    # Virtual memory size
VmRSS:   2048000 kB    # Physical memory in use
VmSwap:        0 kB    # Memory swapped to disk
RssAnon: 1900000 kB    # Anonymous (heap/stack) memory
RssFile:  148000 kB    # File-backed memory (mmap'd files)

Memory Leak Detection Basics

A memory leak occurs when a process continuously allocates memory without freeing it.

Spotting a Leak

# Watch a process's memory over time
$ while true; do
    ps -p 1234 -o pid,rss,vsz,comm --no-headers
    sleep 60
  done | tee /tmp/mem-watch.log

# If RSS grows continuously without leveling off, it is likely a leak.

# Or use pidstat (from sysstat)
$ pidstat -r -p 1234 60
14:30:00  PID  minflt/s  majflt/s     VSZ       RSS    %MEM  Command
14:31:00  1234   45.23      0.00    2415648   2048000   12.5  mysqld
14:32:00  1234   52.17      0.00    2415648   2052096   12.5  mysqld
14:33:00  1234   48.90      0.00    2419744   2060288   12.6  mysqld  ← growing

Using valgrind (For Development)

# Run a program under valgrind to detect leaks
$ valgrind --leak-check=full --show-leak-kinds=all ./my_application
==12345== LEAK SUMMARY:
==12345==    definitely lost: 1,234 bytes in 5 blocks
==12345==    indirectly lost: 5,678 bytes in 12 blocks

Cgroups Memory Limits

Control groups (cgroups) allow you to limit memory usage for a process or group of processes. This prevents a single application from consuming all system memory.

Using systemd (cgroups v2)

# Limit a service to 512 MB of memory
$ sudo systemctl set-property myapp.service MemoryMax=512M

# Or in the service file
$ sudo vim /etc/systemd/system/myapp.service
[Service]
MemoryMax=512M
MemorySwapMax=0        # No swap allowed
MemoryHigh=400M        # Start throttling at 400M
# Check current memory usage of a service
$ systemctl status myapp.service
    Memory: 234.5M (max: 512.0M available: 277.5M)

# Or via cgroupfs
$ cat /sys/fs/cgroup/system.slice/myapp.service/memory.current
245891072

$ cat /sys/fs/cgroup/system.slice/myapp.service/memory.max
536870912

Manual cgroups v2

# Create a cgroup
$ sudo mkdir /sys/fs/cgroup/mygroup

# Set memory limit (256 MB)
$ echo 268435456 | sudo tee /sys/fs/cgroup/mygroup/memory.max

# Add a process to the cgroup
$ echo 1234 | sudo tee /sys/fs/cgroup/mygroup/cgroup.procs

# Check usage
$ cat /sys/fs/cgroup/mygroup/memory.current

Hands-On: Memory Pressure Simulation

# Install stress-ng for memory testing
$ sudo apt install stress-ng

# Allocate 2 GB of memory for 30 seconds
$ stress-ng --vm 1 --vm-bytes 2G --timeout 30s &

# In another terminal, watch the memory impact
$ watch -n1 free -h

# Watch the OOM score change
$ watch -n1 "cat /proc/$(pgrep -f stress-ng | head -1)/oom_score"

Think About It: If you run stress-ng --vm 1 --vm-bytes 20G on a system with 16 GB of RAM and 4 GB of swap, what happens? Which kernel subsystem intervenes?


Debug This

A server has 32 GB of RAM. An application team says the server is "out of memory" and requests a RAM upgrade. Here is what you see:

$ free -h
               total        used        free      shared  buff/cache   available
Mem:            32Gi        28Gi       512Mi       128Mi       3.5Gi       3.8Gi
Swap:          8.0Gi       2.1Gi       5.9Gi

Questions to ask:

  1. Is the server truly out of memory? available shows 3.8 GB. There is still memory available.
  2. But swap is being used (2.1 GB). This means the system has been under memory pressure at some point. Active swapping causes performance issues.
  3. What is using the memory?
$ ps aux --sort=-%mem | head -5
USER    PID %CPU %MEM    VSZ    RSS    COMMAND
java   4567  5.2 62.5  24G    20G     java -Xmx20G ...

The Java application has a 20 GB heap configured. On a 32 GB system, that leaves only 12 GB for the OS, cache, and all other processes. The real fix is not more RAM -- it is right-sizing the Java heap.


┌──────────────────────────────────────────────────────────┐
│                  What Just Happened?                      │
├──────────────────────────────────────────────────────────┤
│                                                           │
│  Linux memory management key points:                      │
│                                                           │
│  Reading free:                                            │
│  - Ignore "free" -- look at "available"                   │
│  - buff/cache is reclaimable, not wasted                  │
│  - Swap usage indicates past memory pressure              │
│                                                           │
│  Swap:                                                    │
│  - Overflow for when RAM is full                          │
│  - swappiness controls how aggressively pages are swapped │
│  - Set low (10-20) for latency-sensitive workloads        │
│                                                           │
│  OOM Killer:                                              │
│  - Last resort when RAM + swap are exhausted              │
│  - Kills the highest oom_score process                    │
│  - Protect critical services with oom_score_adj=-1000     │
│                                                           │
│  Controlling memory:                                      │
│  - cgroups/systemd MemoryMax to limit per-service usage   │
│  - Find leaks with pidstat/valgrind                       │
│  - Monitor with /proc/meminfo and ps/smem                 │
│                                                           │
└──────────────────────────────────────────────────────────┘

Try This

  1. Read free correctly: Run free -h on your system. Calculate what percentage of total RAM is actually available. Is the system under memory pressure?

  2. Watch caching: Clear the page cache (echo 3 > /proc/sys/vm/drop_caches), note the free and buff/cache values, then read a large file. Watch buff/cache grow and free shrink. Verify that available barely changes.

  3. Swappiness experiment: Check your current swappiness. Create a swap file, change swappiness to 100, and run a memory stress test. Then change swappiness to 10 and repeat. Compare how quickly swap is used.

  4. OOM exploration: Find the OOM scores of all your running processes. Which process would the OOM killer target first? Set oom_score_adj=-1000 on your SSH daemon to protect it.

  5. Bonus challenge: Create a systemd service with MemoryMax=100M that runs a script which tries to allocate 200 MB. Watch the OOM killer terminate it. Check journalctl -k for the OOM event.