Disk Management in Production

Why This Matters

It is 2 AM and your monitoring alerts you: the database server's /var/lib/mysql partition is 95% full. The database will stop accepting writes within hours. If this were a traditional fixed partition, you would be looking at downtime -- adding a new disk, copying data, resizing partitions, and praying nothing goes wrong.

But this server uses LVM. You attach a new disk, extend the volume group, grow the logical volume, and resize the filesystem -- all without unmounting, all without downtime. The database never notices.

On another server, one of three disks in a RAID array has failed. The array is still serving data because RAID provides redundancy. You hot-swap the failed disk, add the replacement, and the array rebuilds itself while the application keeps running.

This is what production disk management looks like. LVM gives you flexibility. RAID gives you resilience. Together, they are the foundation of reliable storage in any serious Linux environment. This chapter teaches you both.


Try This Right Now

Check your current disk and partition layout:

$ lsblk
$ df -hT
$ cat /proc/mdstat    # shows RAID status (empty if no RAID)
$ sudo lvs 2>/dev/null    # shows logical volumes (empty if no LVM)
$ sudo pvs 2>/dev/null    # shows physical volumes
$ sudo vgs 2>/dev/null    # shows volume groups

If these LVM commands return nothing, you may not have LVM set up yet -- which is exactly what we are about to learn.


LVM: Logical Volume Management

The Problem LVM Solves

Traditional partitioning is rigid. When you create a 50 GB partition for /home, that is all you get. If you need more space, you have difficult choices: resize the partition (risky), move data to a bigger disk, or add a mount point and split your data.

LVM adds a layer of abstraction between your physical disks and your filesystems. This abstraction gives you the ability to:

  • Resize volumes while they are mounted and in use
  • Span a single volume across multiple physical disks
  • Take snapshots of volumes for backups
  • Move data between physical disks without downtime

The Three Layers of LVM

LVM has three layers, and understanding them is essential:

┌───────────────────────────────────────────────────────┐
│                  FILESYSTEMS                           │
│              /home    /var    /data                     │
├───────────────────────────────────────────────────────┤
│              LOGICAL VOLUMES (LV)                      │
│          lv_home   lv_var   lv_data                    │
│    (These are what you format and mount)               │
├───────────────────────────────────────────────────────┤
│              VOLUME GROUP (VG)                          │
│                   vg_main                              │
│    (A pool of storage from one or more PVs)            │
├───────────────────────────────────────────────────────┤
│             PHYSICAL VOLUMES (PV)                      │
│         /dev/sdb1       /dev/sdc1                      │
│    (Actual disk partitions or whole disks)              │
├───────────────────────────────────────────────────────┤
│              PHYSICAL DISKS                             │
│            /dev/sdb     /dev/sdc                        │
└───────────────────────────────────────────────────────┘

Physical Volume (PV): A disk or partition that has been initialized for use by LVM. Think of it as raw material entering a factory.

Volume Group (VG): A pool of storage formed by combining one or more PVs. Think of it as a warehouse where all the raw material is combined into one big pile.

Logical Volume (LV): A slice of storage carved out from a VG. This is what you actually format with a filesystem and mount. Think of it as the finished product cut from the pile.

Hands-On: Creating an LVM Setup

We will simulate this using loop devices (virtual block devices backed by files). This is safe to do on any system.

Step 1: Create virtual disks

# Create two 500 MB files to act as disks
$ sudo dd if=/dev/zero of=/tmp/disk1.img bs=1M count=500
$ sudo dd if=/dev/zero of=/tmp/disk2.img bs=1M count=500

# Attach them as loop devices
$ sudo losetup /dev/loop10 /tmp/disk1.img
$ sudo losetup /dev/loop11 /tmp/disk2.img

# Verify
$ losetup -a | grep loop1
/dev/loop10: []: (/tmp/disk1.img)
/dev/loop11: []: (/tmp/disk2.img)

Step 2: Create Physical Volumes

$ sudo pvcreate /dev/loop10 /dev/loop11
  Physical volume "/dev/loop10" successfully created.
  Physical volume "/dev/loop11" successfully created.

# Inspect them
$ sudo pvs
  PV           VG   Fmt  Attr PSize   PFree
  /dev/loop10       lvm2 ---  500.00m 500.00m
  /dev/loop11       lvm2 ---  500.00m 500.00m

$ sudo pvdisplay /dev/loop10
  "/dev/loop10" is a new physical volume of "500.00 MiB"
  --- NEW Physical volume ---
  PV Name               /dev/loop10
  VG Name
  PV Size               500.00 MiB
  ...

Step 3: Create a Volume Group

$ sudo vgcreate vg_lab /dev/loop10 /dev/loop11
  Volume group "vg_lab" successfully created

$ sudo vgs
  VG     #PV #LV #SN Attr   VSize   VFree
  vg_lab   2   0   0 wz--n- 992.00m 992.00m

$ sudo vgdisplay vg_lab
  --- Volume group ---
  VG Name               vg_lab
  VG Size               992.00 MiB
  PE Size               4.00 MiB
  Total PE              248
  Free  PE / Size       248 / 992.00 MiB
  ...

Notice that the VG size (992 MB) is slightly less than the raw total (1000 MB) due to LVM metadata overhead.

Step 4: Create Logical Volumes

# Create a 400 MB logical volume
$ sudo lvcreate -n lv_data -L 400M vg_lab
  Logical volume "lv_data" created.

# Create another using 200 MB
$ sudo lvcreate -n lv_logs -L 200M vg_lab
  Logical volume "lv_logs" created.

$ sudo lvs
  LV      VG     Attr       LSize   Pool
  lv_data vg_lab -wi-a----- 400.00m
  lv_logs vg_lab -wi-a----- 200.00m

Step 5: Create filesystems and mount

# Format with ext4
$ sudo mkfs.ext4 /dev/vg_lab/lv_data
$ sudo mkfs.ext4 /dev/vg_lab/lv_logs

# Create mount points and mount
$ sudo mkdir -p /mnt/data /mnt/logs
$ sudo mount /dev/vg_lab/lv_data /mnt/data
$ sudo mount /dev/vg_lab/lv_logs /mnt/logs

# Verify
$ df -h /mnt/data /mnt/logs
Filesystem                  Size  Used Avail Use% Mounted on
/dev/mapper/vg_lab-lv_data  388M  2.3M  362M   1% /mnt/data
/dev/mapper/vg_lab-lv_logs  190M  1.6M  175M   1% /mnt/logs

Think About It: We have 992 MB in the volume group, and we have allocated 600 MB to logical volumes. What happens to the remaining 392 MB? Can we use it later?


Extending and Reducing LVM Volumes

This is where LVM truly shines -- resizing storage on the fly.

Extending a Logical Volume

The /mnt/data volume is getting full. Let us add 200 MB from the free space in the volume group:

# Check free space in the VG
$ sudo vgs
  VG     #PV #LV #SN Attr   VSize   VFree
  vg_lab   2   2   0 wz--n- 992.00m 392.00m

# Extend the logical volume by 200 MB
$ sudo lvextend -L +200M /dev/vg_lab/lv_data
  Size of logical volume vg_lab/lv_data changed from 400.00 MiB to 600.00 MiB.
  Logical volume vg_lab/lv_data successfully resized.

# IMPORTANT: The LV is bigger, but the filesystem still sees the old size
$ df -h /mnt/data
Filesystem                  Size  Used Avail Use% Mounted on
/dev/mapper/vg_lab-lv_data  388M  2.3M  362M   1% /mnt/data

# Resize the filesystem to fill the new space
$ sudo resize2fs /dev/vg_lab/lv_data
resize2fs 1.47.0 (5-Feb-2023)
Filesystem at /dev/vg_lab/lv_data is mounted on /mnt/data; on-line resizing required
Performing an on-line resize of /dev/vg_lab/lv_data to 614400 (1k) blocks.

# Now the filesystem sees the new size
$ df -h /mnt/data
Filesystem                  Size  Used Avail Use% Mounted on
/dev/mapper/vg_lab-lv_data  580M  2.3M  545M   1% /mnt/data

You can combine both steps with a single command:

# The -r flag resizes the filesystem automatically
$ sudo lvextend -L +100M -r /dev/vg_lab/lv_data

Distro Note: For XFS filesystems (default on RHEL/CentOS/Fedora), use xfs_growfs /mnt/data instead of resize2fs. XFS can only grow, never shrink.

Adding a New Disk to a Volume Group

When the entire volume group is full, you can add another physical disk:

# Create a third virtual disk
$ sudo dd if=/dev/zero of=/tmp/disk3.img bs=1M count=500
$ sudo losetup /dev/loop12 /tmp/disk3.img

# Initialize it as a PV and add to the VG
$ sudo pvcreate /dev/loop12
$ sudo vgextend vg_lab /dev/loop12

$ sudo vgs
  VG     #PV #LV #SN Attr   VSize  VFree
  vg_lab   3   2   0 wz--n- <1.46g 692.00m

You just expanded your storage pool without touching existing data. No unmounting, no reformatting, no data copying.

Reducing a Logical Volume

WARNING: Reducing a volume can destroy data if done incorrectly. Always back up first. XFS filesystems cannot be shrunk at all.

# Unmount first (required for shrinking)
$ sudo umount /mnt/logs

# Check the filesystem
$ sudo e2fsck -f /dev/vg_lab/lv_logs

# Shrink filesystem first, then LV
$ sudo resize2fs /dev/vg_lab/lv_logs 100M
$ sudo lvreduce -L 100M /dev/vg_lab/lv_logs
  WARNING: Reducing active logical volume to 100.00 MiB.
  THIS MAY DESTROY YOUR DATA (filesystem etc.)
  Do you really want to reduce vg_lab/lv_logs? [y/n]: y

# Remount
$ sudo mount /dev/vg_lab/lv_logs /mnt/logs

Or use the safe combined approach:

$ sudo umount /mnt/logs
$ sudo lvreduce -L 100M -r /dev/vg_lab/lv_logs
$ sudo mount /dev/vg_lab/lv_logs /mnt/logs

LVM Snapshots

LVM snapshots create a point-in-time copy of a logical volume. They are invaluable for backups and for testing changes safely.

# Create some test data
$ sudo sh -c 'echo "Important data - version 1" > /mnt/data/config.txt'

# Create a snapshot (100M for storing changes)
$ sudo lvcreate -s -n snap_data -L 100M /dev/vg_lab/lv_data
  Logical volume "snap_data" created.

# Now modify the original
$ sudo sh -c 'echo "Important data - version 2 (BROKEN)" > /mnt/data/config.txt'

# Mount the snapshot (read-only) to recover
$ sudo mkdir -p /mnt/snap
$ sudo mount -o ro /dev/vg_lab/snap_data /mnt/snap

# The snapshot still has the original data
$ cat /mnt/snap/config.txt
Important data - version 1

# Recover the file
$ sudo cp /mnt/snap/config.txt /mnt/data/config.txt

# Cleanup
$ sudo umount /mnt/snap
$ sudo lvremove /dev/vg_lab/snap_data

Snapshots use copy-on-write: they only store blocks that change in the original volume after the snapshot is taken. The snapshot volume needs to be large enough to hold all the changes that occur while it exists.

Think About It: If you take a snapshot and then write 200 MB of new data to the original volume, but the snapshot only has 100 MB of space, what happens?


RAID: Redundant Array of Independent Disks

RAID combines multiple disks to provide redundancy, performance, or both. Linux supports software RAID through mdadm.

RAID Levels Explained

RAID 0 (Striping) - Performance, NO redundancy
┌─────────┐  ┌─────────┐
│ Disk 1   │  │ Disk 2   │
│ Block 1  │  │ Block 2  │
│ Block 3  │  │ Block 4  │
│ Block 5  │  │ Block 6  │
└─────────┘  └─────────┘
Min disks: 2 | Usable: 100% | Fault tolerance: NONE
If ANY disk fails, ALL data is lost.

RAID 1 (Mirroring) - Redundancy, reduced capacity
┌─────────┐  ┌─────────┐
│ Disk 1   │  │ Disk 2   │
│ Block 1  │  │ Block 1  │  (identical copy)
│ Block 2  │  │ Block 2  │  (identical copy)
│ Block 3  │  │ Block 3  │  (identical copy)
└─────────┘  └─────────┘
Min disks: 2 | Usable: 50% | Can lose 1 disk

RAID 5 (Striping + Distributed Parity)
┌─────────┐  ┌─────────┐  ┌─────────┐
│ Disk 1   │  │ Disk 2   │  │ Disk 3   │
│ Data A1  │  │ Data A2  │  │ Parity A │
│ Data B1  │  │ Parity B │  │ Data B2  │
│ Parity C │  │ Data C1  │  │ Data C2  │
└─────────┘  └─────────┘  └─────────┘
Min disks: 3 | Usable: (N-1)/N | Can lose 1 disk

RAID 6 (Striping + Double Distributed Parity)
Same as RAID 5 but with two parity blocks per stripe.
Min disks: 4 | Usable: (N-2)/N | Can lose 2 disks

RAID 10 (Mirror + Stripe)
┌─────────┐ ┌─────────┐   ┌─────────┐ ┌─────────┐
│ Disk 1   │ │ Disk 2   │   │ Disk 3   │ │ Disk 4   │
│ Block 1  │ │ Block 1  │   │ Block 2  │ │ Block 2  │
│ Block 3  │ │ Block 3  │   │ Block 4  │ │ Block 4  │
└─────────┘ └─────────┘   └─────────┘ └─────────┘
  Mirror 1                   Mirror 2
  ─────────── Striped ──────────────
Min disks: 4 | Usable: 50% | Can lose 1 disk per mirror

Hands-On: Creating a RAID 1 Array with mdadm

# Install mdadm
$ sudo apt install mdadm        # Debian/Ubuntu
$ sudo dnf install mdadm        # Fedora/RHEL

# Create two virtual disks for RAID
$ sudo dd if=/dev/zero of=/tmp/raid1.img bs=1M count=200
$ sudo dd if=/dev/zero of=/tmp/raid2.img bs=1M count=200
$ sudo losetup /dev/loop20 /tmp/raid1.img
$ sudo losetup /dev/loop21 /tmp/raid2.img

# Create a RAID 1 array
$ sudo mdadm --create /dev/md0 --level=1 --raid-devices=2 \
    /dev/loop20 /dev/loop21
mdadm: array /dev/md0 started.

# Check the status
$ cat /proc/mdstat
Personalities : [raid1]
md0 : active raid1 loop21[1] loop20[0]
      200576 blocks super 1.2 [2/2] [UU]

# The [UU] means both disks are Up. [U_] would mean one is missing.

# Detailed information
$ sudo mdadm --detail /dev/md0
/dev/md0:
         Version : 1.2
   Creation Time : Sat Jan 18 10:30:00 2025
      Raid Level : raid1
      Array Size : 200576 (195.89 MiB)
   Used Dev Size : 200576 (195.89 MiB)
    Raid Devices : 2
   Total Devices : 2
     Active Devices : 2
    Working Devices : 2
     Failed Devices : 0
      Spare Devices : 0
    State : clean

# Format and mount
$ sudo mkfs.ext4 /dev/md0
$ sudo mkdir -p /mnt/raid
$ sudo mount /dev/md0 /mnt/raid

Simulating a Disk Failure and Recovery

# Write some data
$ sudo sh -c 'echo "Critical data on RAID" > /mnt/raid/important.txt'

# Simulate a disk failure
$ sudo mdadm --manage /dev/md0 --fail /dev/loop20
mdadm: set /dev/loop20 faulty in /dev/md0

$ cat /proc/mdstat
md0 : active raid1 loop21[1] loop20[0](F)
      200576 blocks super 1.2 [2/1] [_U]

# [_U] -- first disk is down, second is up
# But data is still accessible!
$ cat /mnt/raid/important.txt
Critical data on RAID

# Remove the failed disk
$ sudo mdadm --manage /dev/md0 --remove /dev/loop20

# Add a replacement disk
$ sudo dd if=/dev/zero of=/tmp/raid3.img bs=1M count=200
$ sudo losetup /dev/loop22 /tmp/raid3.img
$ sudo mdadm --manage /dev/md0 --add /dev/loop22

# Watch the rebuild
$ cat /proc/mdstat
md0 : active raid1 loop22[2] loop21[1]
      200576 blocks super 1.2 [2/1] [_U]
      [========>............]  recovery = 42.5% ...

# Wait for it to finish, then:
$ cat /proc/mdstat
md0 : active raid1 loop22[2] loop21[1]
      200576 blocks super 1.2 [2/2] [UU]

Monitoring RAID Health

# Check array status
$ sudo mdadm --detail /dev/md0

# Scan all arrays
$ sudo mdadm --examine --scan

# Set up email alerts for failures
$ sudo mdadm --monitor --mail=admin@example.com --delay=300 /dev/md0 &

# Or configure monitoring in mdadm.conf
$ cat /etc/mdadm/mdadm.conf
MAILADDR admin@example.com

Disk Health Monitoring with smartctl

Disks warn you before they die -- if you are listening. SMART (Self-Monitoring, Analysis, and Reporting Technology) tracks disk health indicators.

# Install smartmontools
$ sudo apt install smartmontools    # Debian/Ubuntu
$ sudo dnf install smartmontools    # Fedora/RHEL

# Check if a disk supports SMART
$ sudo smartctl -i /dev/sda

# View overall health
$ sudo smartctl -H /dev/sda
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

# View detailed attributes
$ sudo smartctl -A /dev/sda
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      RAW_VALUE
  5 Reallocated_Sector_Ct   0x0033   100   100   010    Pre-fail  0
  9 Power_On_Hours          0x0032   097   097   000    Old_age   14523
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   0
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   0

Key attributes to watch:

AttributeWhat It MeansWorry When
Reallocated_Sector_CtBad sectors replaced by sparesAny value > 0
Current_Pending_SectorSectors waiting to be remappedAny value > 0
Offline_UncorrectableSectors that could not be readAny value > 0
Power_On_HoursTotal hours of operationApproaching rated life
Temperature_CelsiusCurrent temperatureAbove 50C for HDDs
# Run a short self-test
$ sudo smartctl -t short /dev/sda

# Run a long self-test (can take hours)
$ sudo smartctl -t long /dev/sda

# Check test results
$ sudo smartctl -l selftest /dev/sda

# Enable automatic monitoring daemon
$ sudo systemctl enable --now smartd

Distro Note: On RHEL/CentOS, the smartd configuration is at /etc/smartmontools/smartd.conf. On Debian/Ubuntu, it is at /etc/smartd.conf.


Debug This

A junior admin reports: "I extended the logical volume but the filesystem still shows the old size."

$ sudo lvs
  LV      VG      Attr       LSize
  lv_app  vg_prod -wi-ao---- 100.00g

$ df -h /app
Filesystem                   Size  Used Avail Use% Mounted on
/dev/mapper/vg_prod-lv_app    50G   45G  2.5G  95% /app

The LV is 100 GB but the filesystem only sees 50 GB. What did they forget?

Answer: They forgot to resize the filesystem after extending the LV. The fix depends on the filesystem type:

# For ext4:
$ sudo resize2fs /dev/vg_prod/lv_app

# For XFS:
$ sudo xfs_growfs /app

This is one of the most common LVM mistakes. The -r flag on lvextend would have handled this automatically:

$ sudo lvextend -L 100G -r /dev/vg_prod/lv_app

Cleanup

If you followed along with the lab, clean up the loop devices:

$ sudo umount /mnt/data /mnt/logs /mnt/raid /mnt/snap 2>/dev/null
$ sudo lvremove -f vg_lab/lv_data vg_lab/lv_logs 2>/dev/null
$ sudo vgremove vg_lab 2>/dev/null
$ sudo pvremove /dev/loop10 /dev/loop11 /dev/loop12 2>/dev/null
$ sudo mdadm --stop /dev/md0 2>/dev/null
$ sudo losetup -d /dev/loop10 /dev/loop11 /dev/loop12 /dev/loop20 /dev/loop21 /dev/loop22 2>/dev/null
$ sudo rm -f /tmp/disk1.img /tmp/disk2.img /tmp/disk3.img /tmp/raid1.img /tmp/raid2.img /tmp/raid3.img

┌──────────────────────────────────────────────────────────┐
│                  What Just Happened?                      │
├──────────────────────────────────────────────────────────┤
│                                                           │
│  LVM provides flexible storage management:                │
│  - PV (Physical Volume) → raw disk/partition              │
│  - VG (Volume Group)    → pool of PVs                     │
│  - LV (Logical Volume)  → usable slice from a VG          │
│                                                           │
│  Key LVM operations:                                      │
│  - pvcreate/vgcreate/lvcreate → build the stack           │
│  - lvextend -r  → grow a volume + filesystem              │
│  - lvreduce -r  → shrink (backup first!)                  │
│  - lvcreate -s  → snapshot for backup/testing             │
│                                                           │
│  RAID provides disk redundancy:                           │
│  - RAID 0 = speed, no safety                              │
│  - RAID 1 = mirror, can lose one disk                     │
│  - RAID 5 = parity across 3+ disks                        │
│  - RAID 10 = mirror + stripe (production favorite)        │
│                                                           │
│  smartctl monitors disk health before failure.            │
│                                                           │
└──────────────────────────────────────────────────────────┘

Try This

  1. LVM basics: Create three loop devices, combine them into a volume group, create two logical volumes, format them with ext4, and mount them.

  2. Online resize: Write a 50 MB file to one of your logical volumes, then extend the volume by 200 MB using lvextend -r. Verify the file is still intact.

  3. Snapshot backup: Create a snapshot of a logical volume, write new files to the original, then mount the snapshot read-only and verify it still has the old data.

  4. RAID simulation: Create a RAID 5 array with three loop devices. Write data, mark one device as failed, verify data is still readable, then add a replacement and watch the rebuild.

  5. Bonus challenge: Combine LVM and RAID -- create a RAID 1 array with mdadm, then use the RAID device as a physical volume for LVM. This is how many production servers are configured.