Backup Strategies

Why This Matters

A small startup stored everything on a single server -- code, databases, customer data, configuration. They had no backups. One night, a disk failed. They lost three years of work, their customer database, and ultimately, the company.

This is not an unusual story. Backups are the single most important thing you can do for any system you care about. Yet they are routinely neglected, improperly configured, or never tested. The most dangerous backup is one you have never tried to restore.

This chapter teaches you practical, battle-tested backup strategies using open source tools. You will learn the 3-2-1 rule, understand different backup types, and get hands-on with tar, rsync, borgbackup, and restic. By the end, you will have the knowledge to build a backup system that could save your job -- or your company.


Try This Right Now

Before we build anything new, check what backup mechanisms already exist on your system:

# Check if any cron jobs reference backup
$ sudo crontab -l 2>/dev/null | grep -i backup
$ crontab -l 2>/dev/null | grep -i backup

# Check for systemd backup timers
$ systemctl list-timers --all | grep -i backup

# Check if common backup tools are installed
$ which rsync && rsync --version | head -1
$ which borgbackup 2>/dev/null || which borg 2>/dev/null
$ which restic 2>/dev/null

# Check your disk usage (what needs backing up?)
$ df -h
$ du -sh /home /etc /var/log 2>/dev/null

The 3-2-1 Backup Rule

Before touching any tools, understand the strategy. The 3-2-1 rule is the gold standard:

┌──────────────────────────────────────────────────────┐
│              THE 3-2-1 BACKUP RULE                    │
│                                                       │
│  3  Keep at least THREE copies of your data           │
│     (1 primary + 2 backups)                           │
│                                                       │
│  2  Store backups on TWO different types of media     │
│     (local disk + cloud, or local disk + tape)        │
│                                                       │
│  1  Keep at least ONE copy offsite                    │
│     (different building, different city, cloud)       │
│                                                       │
│  WHY?                                                 │
│  - 1 copy: disk failure = total loss                  │
│  - 2 copies on same media: both can fail together     │
│  - 2 copies in same location: fire/flood = total loss │
│  - 3-2-1: survives any single disaster                │
└──────────────────────────────────────────────────────┘

Backup Types

Understanding the three main backup types helps you balance speed, storage, and recovery time:

Full Backup: Copies EVERYTHING every time
Day 1: [████████████████] 100 GB    ← complete copy
Day 2: [████████████████] 100 GB    ← another complete copy
Day 3: [████████████████] 100 GB    ← another complete copy
+ Simple to restore (just one backup needed)
- Uses the most storage and time

Incremental Backup: Copies only what changed SINCE LAST BACKUP
Day 1: [████████████████] 100 GB    ← full backup
Day 2: [██]               5 GB     ← changes since Day 1
Day 3: [█]                2 GB     ← changes since Day 2
Day 4: [███]               8 GB     ← changes since Day 3
+ Uses least storage
- Restore requires full + ALL incrementals in order

Differential Backup: Copies what changed SINCE LAST FULL
Day 1: [████████████████] 100 GB    ← full backup
Day 2: [██]               5 GB     ← changes since Day 1
Day 3: [███]               7 GB     ← changes since Day 1
Day 4: [█████]            12 GB     ← changes since Day 1
+ Restore requires only full + latest differential
- Uses more storage than incremental

Think About It: A critical server has 500 GB of data but only 2-3 GB changes per day. Which backup strategy would you choose and why? What if the recovery time objective is under 30 minutes?


tar: The Classic Backup Tool

tar (tape archive) has been the standard Unix backup tool since the 1970s. It creates archive files from directories.

Basic tar Backup

# Create a compressed backup of /etc
$ sudo tar -czf /backup/etc-$(date +%Y%m%d).tar.gz /etc
# -c = create
# -z = compress with gzip
# -f = filename

# List contents without extracting
$ tar -tzf /backup/etc-20250118.tar.gz | head -20
etc/
etc/hostname
etc/fstab
etc/passwd
...

# Extract to a specific directory
$ mkdir /tmp/restore_test
$ tar -xzf /backup/etc-20250118.tar.gz -C /tmp/restore_test
# -x = extract
# -C = change to directory before extracting

# Extract a single file
$ tar -xzf /backup/etc-20250118.tar.gz -C /tmp/ etc/fstab

Backup with Better Compression

# Use xz for better compression (slower but smaller files)
$ sudo tar -cJf /backup/etc-$(date +%Y%m%d).tar.xz /etc

# Use zstd for a good balance of speed and compression
$ sudo tar --zstd -cf /backup/etc-$(date +%Y%m%d).tar.zst /etc

# Compare sizes
$ ls -lh /backup/etc-20250118.*
-rw-r--r-- 1 root root  4.2M /backup/etc-20250118.tar.gz
-rw-r--r-- 1 root root  3.1M /backup/etc-20250118.tar.xz
-rw-r--r-- 1 root root  3.5M /backup/etc-20250118.tar.zst

Incremental Backups with tar

# Full backup (creates a snapshot file)
$ sudo tar -czf /backup/home-full-$(date +%Y%m%d).tar.gz \
    --listed-incremental=/backup/home.snar /home

# Next day: incremental backup (only changes since last)
$ sudo tar -czf /backup/home-inc-$(date +%Y%m%d).tar.gz \
    --listed-incremental=/backup/home.snar /home

# To restore: apply full first, then each incremental in order
$ tar -xzf /backup/home-full-20250118.tar.gz -C /
$ tar -xzf /backup/home-inc-20250119.tar.gz -C /
$ tar -xzf /backup/home-inc-20250120.tar.gz -C /

Excluding Files from Backups

# Exclude patterns
$ sudo tar -czf /backup/home.tar.gz \
    --exclude='*.tmp' \
    --exclude='.cache' \
    --exclude='node_modules' \
    --exclude='.local/share/Trash' \
    /home

# Or use an exclude file
$ cat /backup/exclude-list.txt
*.tmp
.cache
node_modules
.local/share/Trash
__pycache__

$ sudo tar -czf /backup/home.tar.gz \
    --exclude-from=/backup/exclude-list.txt /home

rsync: Efficient File Synchronization

While tar creates archives, rsync synchronizes files between locations. It only transfers differences, making it extremely efficient for repeated backups.

Basic rsync

# Sync a directory to a backup location
$ rsync -avh /home/user/projects/ /backup/projects/
# -a = archive mode (preserves permissions, timestamps, symlinks, etc.)
# -v = verbose
# -h = human-readable sizes

sending incremental file list
./
index.html
css/style.css
js/app.js

sent 45.23K bytes  received 92 bytes  90.64K bytes/sec
total size is 44.98K  speedup is 0.99

# Run it again -- only changes are transferred
$ rsync -avh /home/user/projects/ /backup/projects/
sending incremental file list

sent 234 bytes  received 12 bytes  492.00 bytes/sec
total size is 44.98K  speedup is 182.85

Notice the "speedup" -- the second run transferred almost nothing because nothing changed.

WARNING: Trailing slashes matter in rsync! /home/user/projects/ (with slash) syncs the contents of projects. /home/user/projects (without slash) syncs the directory itself, creating /backup/projects/projects/.

rsync with Delete (Mirror)

# Mirror source to destination (delete files in dest that are not in source)
$ rsync -avh --delete /home/user/projects/ /backup/projects/

WARNING: --delete removes files from the destination that no longer exist in the source. Always test with --dry-run first:

$ rsync -avh --delete --dry-run /home/user/projects/ /backup/projects/

rsync Over SSH

This is one of the most common backup patterns -- syncing data to a remote server over an encrypted SSH connection:

# Backup to a remote server
$ rsync -avh -e ssh /home/user/projects/ backupuser@backup-server:/backup/projects/

# With specific SSH key and port
$ rsync -avh -e "ssh -i ~/.ssh/backup_key -p 2222" \
    /home/user/projects/ backupuser@backup-server:/backup/projects/

# Limit bandwidth to 5 MB/s (useful over WAN)
$ rsync -avh --bwlimit=5000 -e ssh \
    /home/user/projects/ backupuser@backup-server:/backup/projects/

# Show progress
$ rsync -avh --progress -e ssh \
    /home/user/projects/ backupuser@backup-server:/backup/projects/

rsync Backup Script

A practical daily backup script using rsync:

#!/bin/bash
# backup.sh - Daily rsync backup with rotation

BACKUP_SRC="/home /etc /var/www"
BACKUP_DST="/backup"
DATE=$(date +%Y%m%d)
LOG="/var/log/backup-${DATE}.log"

echo "=== Backup started at $(date) ===" | tee -a "$LOG"

for src in $BACKUP_SRC; do
    dirname=$(basename "$src")
    echo "Backing up $src..." | tee -a "$LOG"
    rsync -ah --delete \
        --exclude='.cache' \
        --exclude='*.tmp' \
        "$src/" "${BACKUP_DST}/${dirname}/" \
        >> "$LOG" 2>&1
done

echo "=== Backup finished at $(date) ===" | tee -a "$LOG"

# Delete logs older than 30 days
find /var/log -name "backup-*.log" -mtime +30 -delete

borgbackup: Deduplicated, Encrypted Backups

borgbackup (borg) is a modern backup tool that provides deduplication, compression, and encryption. It is the tool of choice for many Linux administrators.

Why borg?

  • Deduplication: If a file appears in 100 backups, it is stored only once
  • Compression: Multiple algorithms (lz4, zstd, zlib, lzma)
  • Encryption: AES-256 encryption at rest
  • Efficient: Only transfers and stores unique data chunks
  • Pruning: Automatic retention policy management

Installing borg

# Debian/Ubuntu
$ sudo apt install borgbackup

# Fedora/RHEL
$ sudo dnf install borgbackup

# Arch
$ sudo pacman -S borg

# Or via pip
$ pip install borgbackup

Hands-On: Complete borg Workflow

Step 1: Initialize a repository

# Create a local encrypted repository
$ borg init --encryption=repokey /backup/borg-repo
Enter new passphrase:
Enter same passphrase again:
Do you want your passphrase to be displayed for verification? [yN]: y

# CRITICAL: Export and save the key somewhere safe!
$ borg key export /backup/borg-repo /backup/borg-key-backup.txt

WARNING: If you lose both the passphrase and the key, your backups are irrecoverable. Store the key export separately from the backups.

Step 2: Create a backup (archive)

# Create an archive named with the date
$ borg create --stats --progress \
    /backup/borg-repo::home-$(date +%Y%m%d-%H%M) \
    /home \
    --exclude '/home/*/.cache' \
    --exclude '/home/*/Downloads' \
    --exclude '*.tmp'

Archive name: home-20250118-1430
Archive fingerprint: a1b2c3...
Time (start): Sat, 2025-01-18 14:30:00
Time (end):   Sat, 2025-01-18 14:32:15
Duration: 2 minutes 15 seconds
Number of files: 28547
                       Original size      Compressed size    Deduplicated size
This archive:               12.45 GB              8.23 GB              2.15 GB
All archives:               12.45 GB              8.23 GB              2.15 GB

Notice the "Deduplicated size" -- this is the actual new data stored. After the first backup, subsequent backups store much less.

Step 3: Create another backup and see deduplication in action

# Next day, create another archive
$ borg create --stats /backup/borg-repo::home-20250119-1430 /home \
    --exclude '/home/*/.cache'

                       Original size      Compressed size    Deduplicated size
This archive:               12.48 GB              8.25 GB             85.32 MB
All archives:               24.93 GB             16.48 GB              2.23 GB

The second backup was 12.48 GB of data, but only 85 MB of new (deduplicated) data was actually stored.

Step 4: List and inspect archives

# List all archives
$ borg list /backup/borg-repo
home-20250118-1430       Sat, 2025-01-18 14:30:00
home-20250119-1430       Sun, 2025-01-19 14:30:00

# List files in a specific archive
$ borg list /backup/borg-repo::home-20250118-1430 | head -10
drwxr-xr-x user   user      0 Sat, 2025-01-18 14:00:00 home/user
-rw-r--r-- user   user   4521 Sat, 2025-01-18 13:45:00 home/user/.bashrc
...

# Show archive info
$ borg info /backup/borg-repo::home-20250118-1430

Step 5: Restore from a borg backup

# Restore entire archive to a directory
$ mkdir /tmp/borg-restore
$ cd /tmp/borg-restore
$ borg extract /backup/borg-repo::home-20250118-1430

# Restore a specific file
$ borg extract /backup/borg-repo::home-20250118-1430 home/user/.bashrc

# Restore with a dry run (list what would be extracted)
$ borg extract --dry-run --list /backup/borg-repo::home-20250118-1430

Step 6: Prune old backups (retention policy)

# Keep the last 7 daily, 4 weekly, 6 monthly, and 1 yearly backups
$ borg prune --stats --list \
    --keep-daily=7 \
    --keep-weekly=4 \
    --keep-monthly=6 \
    --keep-yearly=1 \
    /backup/borg-repo

# Always compact after pruning to reclaim disk space
$ borg compact /backup/borg-repo

borg Over SSH (Remote Backups)

# Initialize a remote repository
$ borg init --encryption=repokey ssh://backupuser@backup-server/~/borg-repo

# Create a backup to the remote repository
$ borg create --stats \
    ssh://backupuser@backup-server/~/borg-repo::home-$(date +%Y%m%d) \
    /home

Think About It: borg deduplicates at the chunk level, not the file level. What does this mean if you have a 1 GB database dump that changes by only 5 MB each day?


restic: Another Excellent Option

restic is similar to borg but with a different design philosophy. It supports multiple backends (local, S3, SFTP, Azure, GCS) out of the box.

# Install restic
$ sudo apt install restic        # Debian/Ubuntu
$ sudo dnf install restic        # Fedora/RHEL

# Initialize a repository
$ restic init --repo /backup/restic-repo
enter password for new repository:
enter password again:
created restic repository at /backup/restic-repo

# Create a backup
$ restic -r /backup/restic-repo backup /home
enter password for repository:
repository opened
Files:       12345 new
Added to the repo: 2.15 GiB

# List snapshots
$ restic -r /backup/restic-repo snapshots
ID        Time                 Host        Tags    Paths
a1b2c3d4  2025-01-18 14:30:00  myhost              /home

# Restore a snapshot
$ restic -r /backup/restic-repo restore a1b2c3d4 --target /tmp/restore

# Backup to S3
$ export AWS_ACCESS_KEY_ID=your_key
$ export AWS_SECRET_ACCESS_KEY=your_secret
$ restic -r s3:s3.amazonaws.com/my-backup-bucket init
$ restic -r s3:s3.amazonaws.com/my-backup-bucket backup /home

# Prune old snapshots
$ restic -r /backup/restic-repo forget --keep-daily 7 --keep-weekly 4 --prune

borg vs restic Quick Comparison

Featureborgbackuprestic
DeduplicationYes (content-defined chunking)Yes (content-defined chunking)
EncryptionAES-256-CTRAES-256-CTR
CompressionMultiple algorithmszstd (since 0.16)
Cloud backendsSSH only (natively)S3, Azure, GCS, SFTP, local
SpeedGenerally faster for localBetter for cloud targets
MaturityLonger track recordNewer, very active development

Both are excellent. If you backup to local disks or SSH, borg is a great choice. If you backup to cloud storage, restic has more native backend support.


Backup Rotation and Retention

A common retention scheme:

┌──────────────────────────────────────────────────────┐
│               RETENTION POLICY EXAMPLE                │
│                                                       │
│  Daily backups:   Keep last 7 days                    │
│  Weekly backups:  Keep last 4 weeks                   │
│  Monthly backups: Keep last 12 months                 │
│  Yearly backups:  Keep last 3 years                   │
│                                                       │
│  Timeline:                                            │
│  ◄─ 7 days ──►◄── 4 weeks ──►◄── 12 months ──►      │
│  D D D D D D D W  W  W  W  M   M   M  ...  M  Y Y Y │
│                                                       │
│  Old backups are pruned automatically.                │
│  Backups at transition boundaries are promoted.       │
└──────────────────────────────────────────────────────┘

For tar and rsync backups, you implement rotation manually:

# Delete tar backups older than 30 days
$ find /backup -name "*.tar.gz" -mtime +30 -delete

# Keep only the last 7 daily rsync backups using dated directories
$ ls -d /backup/daily-* | head -n -7 | xargs rm -rf

For borg and restic, use their built-in prune commands as shown above.


Testing Restores

A backup that has never been tested is not a backup. It is a hope.

# Test a tar restore
$ mkdir /tmp/restore-test
$ tar -xzf /backup/etc-20250118.tar.gz -C /tmp/restore-test
$ diff -r /etc /tmp/restore-test/etc --brief | head -20

# Test a borg restore
$ mkdir /tmp/borg-test
$ cd /tmp/borg-test
$ borg extract --dry-run --list /backup/borg-repo::home-20250118-1430
# If dry-run succeeds, the archive is readable

# Verify borg repository integrity
$ borg check /backup/borg-repo
$ borg check --verify-data /backup/borg-repo    # slower but thorough

# Test a restic restore
$ restic -r /backup/restic-repo check
$ restic -r /backup/restic-repo restore latest --target /tmp/restic-test

Schedule monthly restore tests. Add them to your calendar. Treat untested backups as nonexistent.


Automating Backups

With cron

# Edit root's crontab
$ sudo crontab -e
# Daily backup at 2 AM
0 2 * * * /usr/local/bin/backup.sh >> /var/log/backup.log 2>&1

# Weekly full backup Sunday at 1 AM
0 1 * * 0 /usr/local/bin/full-backup.sh >> /var/log/backup.log 2>&1

With systemd Timers

# Create the service unit
$ sudo vim /etc/systemd/system/backup.service
[Unit]
Description=Daily Backup
After=network-online.target
Wants=network-online.target

[Service]
Type=oneshot
ExecStart=/usr/local/bin/backup.sh
User=root
# Create the timer unit
$ sudo vim /etc/systemd/system/backup.timer
[Unit]
Description=Run backup daily at 2 AM

[Timer]
OnCalendar=*-*-* 02:00:00
Persistent=true
RandomizedDelaySec=300

[Install]
WantedBy=timers.target
# Enable and start
$ sudo systemctl daemon-reload
$ sudo systemctl enable --now backup.timer

# Check timer status
$ systemctl list-timers | grep backup
NEXT                         LEFT     LAST                         PASSED   UNIT
Sun 2025-01-19 02:00:00 UTC  11h left Sat 2025-01-18 02:00:00 UTC 12h ago  backup.timer

The Persistent=true setting ensures that if the system was off when the timer should have fired, it runs the backup as soon as the system boots.


Debug This

An admin's backup script has been running for months but the restores fail. Here is the script:

#!/bin/bash
tar -czf /backup/nightly.tar.gz /var/www /etc /home 2>/dev/null

What is wrong?

Problems:

  1. Same filename every night: nightly.tar.gz is overwritten each run. There is only ever one backup. If the current one is corrupt, there is nothing to fall back on.
  2. Errors are silenced: 2>/dev/null hides all error messages. If /var/www is too large or permissions fail, the admin never knows.
  3. No verification: No check that the archive is valid after creation.
  4. No rotation: No old backups kept.

Fixed version:

#!/bin/bash
set -euo pipefail
DATE=$(date +%Y%m%d-%H%M%S)
BACKUP="/backup/nightly-${DATE}.tar.gz"
LOG="/var/log/backup-${DATE}.log"

echo "Starting backup at $(date)" | tee "$LOG"
tar -czf "$BACKUP" /var/www /etc /home 2>&1 | tee -a "$LOG"

# Verify the archive
tar -tzf "$BACKUP" > /dev/null 2>&1
if [ $? -eq 0 ]; then
    echo "Backup verified successfully" | tee -a "$LOG"
else
    echo "ERROR: Backup verification failed!" | tee -a "$LOG"
    exit 1
fi

# Remove backups older than 30 days
find /backup -name "nightly-*.tar.gz" -mtime +30 -delete

echo "Backup completed at $(date)" | tee -a "$LOG"

┌──────────────────────────────────────────────────────────┐
│                  What Just Happened?                      │
├──────────────────────────────────────────────────────────┤
│                                                           │
│  The 3-2-1 rule: 3 copies, 2 media types, 1 offsite.    │
│                                                           │
│  Backup types:                                            │
│  - Full: everything, every time (simple, large)           │
│  - Incremental: changes since last backup (small, complex)│
│  - Differential: changes since last full (middle ground)  │
│                                                           │
│  Tools:                                                   │
│  - tar: simple archives, great for /etc and small dirs    │
│  - rsync: efficient syncing, good for file-level backup   │
│  - borgbackup: dedup + encryption + compression           │
│  - restic: like borg with native cloud backend support    │
│                                                           │
│  Golden rules:                                            │
│  1. Automate backups (cron or systemd timers)             │
│  2. Test restores regularly                               │
│  3. Monitor for failures                                  │
│  4. Keep backups offsite                                  │
│  5. An untested backup is not a backup                    │
│                                                           │
└──────────────────────────────────────────────────────────┘

Try This

  1. tar basics: Create a compressed backup of /etc with a date-stamped filename. Extract it to /tmp and verify the contents match.

  2. rsync mirror: Use rsync to mirror a directory to a backup location. Modify some files, delete some files, and run rsync again with --delete. Verify the mirror is exact.

  3. borg workflow: Initialize a borg repository, create three archives (modifying some files between each), then prune to keep only the latest two. Verify pruning worked with borg list.

  4. Automate it: Write a backup script that uses either borg or rsync, add error checking, and schedule it with a systemd timer that runs daily at 3 AM.

  5. Bonus challenge: Set up borg to back up to a remote server over SSH. Configure a retention policy of 7 daily, 4 weekly, and 6 monthly backups. Write a script that creates the backup, prunes old archives, and compacts the repository, all in one run.