Linux on Cloud

Why This Matters

A startup has a brilliant idea. Ten years ago, they would need to buy servers, rent data center space, run cables, set up cooling, and wait weeks before writing a single line of code. Today, they open a browser, click a few buttons, and have a Linux server running in under 60 seconds. They pay only for what they use. If their app goes viral and they need 100 servers instead of one, they can scale up in minutes, not months.

This is cloud computing, and it runs overwhelmingly on Linux. Over 90% of cloud workloads run on Linux. Every major cloud provider defaults to Linux instances. The tools that manage cloud infrastructure -- Terraform, Ansible, Kubernetes, Docker -- are Linux-native.

Whether you are deploying a personal project or managing enterprise infrastructure, understanding how Linux behaves in the cloud is essential. Cloud Linux has different networking, storage, initialization, and management patterns compared to bare-metal or VM installations.


Try This Right Now

Even without a cloud account, you can explore how cloud instances identify themselves:

# Check if you are running in a cloud environment
$ systemd-detect-virt
# Returns: kvm, xen, microsoft, oracle, amazon, google, or "none"

# Check for cloud-init (the standard cloud initialization tool)
$ which cloud-init && cloud-init status
# If installed: "status: done"

# See if a metadata service is available (cloud instances only)
$ curl -s --connect-timeout 2 http://169.254.169.254/ 2>/dev/null
# Returns metadata API on cloud instances, timeout on local machines

# Check your system's DMI information for cloud indicators
$ sudo dmidecode -s system-manufacturer 2>/dev/null
# Might show: "Amazon EC2", "Google Compute Engine", "Microsoft Corporation"

If you are on a local machine, these will mostly return empty or "none." That is fine -- it shows you what to look for on cloud instances.


Cloud Computing Basics

Service Models

Cloud computing is divided into service models based on how much the provider manages:

┌──────────────────────────────────────────────────────────────┐
│                CLOUD SERVICE MODELS                           │
│                                                              │
│  ┌──────────────────────────────────────────────────────┐    │
│  │                     YOU MANAGE                        │    │
│  │                                                      │    │
│  │  On-Premises    IaaS          PaaS          SaaS     │    │
│  │  ───────────    ────          ────          ────     │    │
│  │  Application    Application   Application   ----     │    │
│  │  Data           Data          Data          ----     │    │
│  │  Runtime        Runtime       ----          ----     │    │
│  │  Middleware      Middleware    ----          ----     │    │
│  │  OS             OS            ----          ----     │    │
│  │  ─ ─ ─ ─ ─ ─   ─ ─ ─ ─ ─    ─ ─ ─ ─ ─    ─ ─ ─   │    │
│  │  Virtualization ----          ----          ----     │    │
│  │  Servers        ----          ----          ----     │    │
│  │  Storage        ----          ----          ----     │    │
│  │  Networking     ----          ----          ----     │    │
│  │                                                      │    │
│  │                     PROVIDER MANAGES                  │    │
│  └──────────────────────────────────────────────────────┘    │
│                                                              │
│  IaaS = Infrastructure as a Service (VMs, networks, storage) │
│  PaaS = Platform as a Service (managed runtime/database)     │
│  SaaS = Software as a Service (just use the application)     │
│                                                              │
└──────────────────────────────────────────────────────────────┘

IaaS is where Linux knowledge matters most. You get a virtual machine, install an OS (usually Linux), and manage everything from the OS up.

Open Cloud Platforms

While the largest cloud providers are commercial, several open-source platforms let you build your own cloud:

  • OpenStack: The most mature open-source cloud platform, used by many telecom companies and research institutions
  • Apache CloudStack: Powers large cloud deployments
  • Proxmox VE: Combines KVM virtualization and LXC containers with a web interface
  • oVirt: Red Hat's open-source virtualization management platform

Cloud Images vs. Regular Installs

When you install Linux on a physical machine, you boot from an ISO, answer setup questions, and wait for package installation. Cloud instances do not work that way.

Cloud instances boot from cloud images -- pre-built, minimal OS images designed for instant deployment:

┌──────────────────────────────────────────────────────────────┐
│          TRADITIONAL INSTALL vs CLOUD IMAGE                   │
│                                                              │
│  TRADITIONAL INSTALL              CLOUD IMAGE                │
│  ───────────────────              ───────────                │
│  Boot from ISO                    Image already built        │
│  Answer setup wizard              Configuration via API      │
│  Install packages (10-30 min)     Boot in 30-60 seconds      │
│  Set hostname manually            Hostname set by cloud-init │
│  Configure network manually       Network auto-configured    │
│  Create users manually            SSH keys injected          │
│  Set up SSH manually              SSH ready immediately      │
│  Unique installation each time    Same image, every time     │
│                                                              │
└──────────────────────────────────────────────────────────────┘

Cloud images are:

  • Minimal: No GUI, no unnecessary packages
  • Generic: Work on any cloud provider's hypervisor
  • Pre-configured for cloud-init: Accept configuration at first boot
  • Compressed: Small download size (300-800 MB)
# Download an Ubuntu cloud image (for local testing with QEMU/KVM)
$ wget https://cloud-images.ubuntu.com/jammy/current/jammy-server-cloudimg-amd64.img

# Check the image format
$ qemu-img info jammy-server-cloudimg-amd64.img
image: jammy-server-cloudimg-amd64.img
file format: qcow2
virtual size: 2.2 GiB
disk size: 626 MiB

cloud-init: Instance Initialization

cloud-init is the industry standard for initializing cloud instances on first boot. Nearly every Linux cloud image includes it. It reads configuration from the cloud provider's metadata service and sets up the instance.

What cloud-init Does

┌──────────────────────────────────────────────────────────────┐
│                 cloud-init STAGES                             │
│                                                              │
│  Instance boots for the first time:                          │
│                                                              │
│  1. DETECT    → Identify cloud platform (AWS, GCP, etc.)     │
│  2. INIT      → Set hostname, configure networking           │
│  3. CONFIG    → Install packages, write files, run commands  │
│  4. FINAL     → Run user scripts, signal ready               │
│                                                              │
│  Data sources:                                               │
│  • Metadata service (http://169.254.169.254/)                │
│  • User-data (custom scripts/configs)                        │
│  • Vendor-data (provider defaults)                           │
│                                                              │
└──────────────────────────────────────────────────────────────┘

cloud-init Configuration (User Data)

When launching a cloud instance, you provide "user data" -- a cloud-init configuration file that runs on first boot:

#cloud-config

# Set the hostname
hostname: web-server-01

# Create users
users:
  - name: deploy
    groups: sudo
    shell: /bin/bash
    sudo: ALL=(ALL) NOPASSWD:ALL
    ssh_authorized_keys:
      - ssh-ed25519 AAAA... deploy@laptop

# Install packages
package_update: true
packages:
  - nginx
  - htop
  - curl
  - fail2ban

# Write files
write_files:
  - path: /var/www/html/index.html
    content: |
      <h1>Server provisioned by cloud-init</h1>
    owner: www-data:www-data
    permissions: '0644'

# Run commands on first boot
runcmd:
  - systemctl enable --now nginx
  - systemctl enable --now fail2ban
  - echo "Instance provisioned at $(date)" >> /var/log/provision.log

# Configure timezone
timezone: UTC

Checking cloud-init Status

# Check if cloud-init finished successfully
$ cloud-init status
status: done

# See detailed cloud-init output
$ cloud-init status --long

# View cloud-init logs
$ cat /var/log/cloud-init-output.log

# See what cloud-init configured
$ cloud-init query instance_id
$ cloud-init query region
$ cloud-init query local_hostname

Think About It: cloud-init runs only on first boot by default. If you change the user data and reboot, the changes will not apply. How would you handle configuration changes after the initial boot? (Hint: think about what we learned in Chapters 67 and 68.)


The Metadata Service

Every cloud instance has access to a metadata service at a well-known IP address: 169.254.169.254. This link-local address is routed internally by the cloud provider and provides information about the instance.

# Query the metadata service (example for a generic cloud instance)
$ curl -s http://169.254.169.254/latest/meta-data/

# Common metadata endpoints (vary by provider):
$ curl -s http://169.254.169.254/latest/meta-data/instance-id
$ curl -s http://169.254.169.254/latest/meta-data/local-ipv4
$ curl -s http://169.254.169.254/latest/meta-data/public-ipv4
$ curl -s http://169.254.169.254/latest/meta-data/hostname
$ curl -s http://169.254.169.254/latest/meta-data/instance-type

# Retrieve user-data (your cloud-init config)
$ curl -s http://169.254.169.254/latest/user-data

Safety Warning: The metadata service can expose sensitive information, including temporary security credentials. If your instance runs a web application, ensure that users cannot proxy requests to 169.254.169.254. This is a well-known attack vector called SSRF (Server-Side Request Forgery). Many cloud providers now offer IMDSv2, which requires a token for metadata access.


Cloud Networking

Cloud networking differs significantly from physical networking:

┌──────────────────────────────────────────────────────────────┐
│              CLOUD NETWORKING CONCEPTS                        │
│                                                              │
│  VPC (Virtual Private Cloud)                                 │
│  └── Your isolated network in the cloud                      │
│      ├── Subnet A (10.0.1.0/24) - Public                     │
│      │   ├── Instance 1 (10.0.1.10) + Public IP              │
│      │   └── Instance 2 (10.0.1.11) + Public IP              │
│      ├── Subnet B (10.0.2.0/24) - Private                    │
│      │   ├── Database (10.0.2.10) - No public IP             │
│      │   └── Cache (10.0.2.11) - No public IP                │
│      ├── Internet Gateway - connects VPC to internet         │
│      ├── NAT Gateway - lets private instances reach out      │
│      └── Route Tables - control traffic flow                 │
│                                                              │
└──────────────────────────────────────────────────────────────┘

Security Groups vs. Firewalls

Traditional Linux firewalls (iptables, nftables) work at the OS level. Cloud security groups work at the hypervisor level, before traffic reaches your instance:

┌──────────────────────────────────────────────────────────────┐
│           SECURITY GROUPS vs IPTABLES                        │
│                                                              │
│  SECURITY GROUP (cloud-level)                                │
│  ─────────────────────────────                               │
│  • Managed through cloud API/console                         │
│  • Stateful (return traffic auto-allowed)                    │
│  • Applied per-instance or per-network interface             │
│  • Default: deny all inbound, allow all outbound             │
│  • Cannot see or modify from inside the instance             │
│                                                              │
│  IPTABLES/NFTABLES (OS-level)                                │
│  ──────────────────────────────                              │
│  • Managed from inside the instance                          │
│  • Additional layer of defense                               │
│  • Can do things security groups cannot (rate limiting, etc.) │
│                                                              │
│  Best practice: USE BOTH. Security groups for broad rules,   │
│  OS firewall for fine-grained control.                       │
│                                                              │
└──────────────────────────────────────────────────────────────┘

Cloud Storage

Cloud storage comes in several types, and understanding them is critical:

Instance Storage (Ephemeral)

  • Temporary storage attached directly to the host
  • Data is lost when the instance stops or terminates
  • Very fast (local SSD)
  • Use for: temp files, caches, scratch data

Block Storage (Persistent)

  • Network-attached volumes (like a virtual hard drive)
  • Persists independently of the instance
  • Can be detached and reattached to different instances
  • Use for: databases, application data, anything that must survive reboots
# On a cloud instance, check your block devices
$ lsblk
NAME    MAJ:MIN RM  SIZE RO TYPE MOUNTPOINTS
xvda    202:0    0   20G  0 disk
└─xvda1 202:1    0   20G  0 part /
xvdb    202:16   0  100G  0 disk                  ← Additional volume

# Format and mount an additional volume
$ sudo mkfs.ext4 /dev/xvdb
$ sudo mkdir /data
$ sudo mount /dev/xvdb /data

# Make it persistent
$ echo '/dev/xvdb /data ext4 defaults 0 2' | sudo tee -a /etc/fstab

Object Storage

  • Stores files as objects with metadata
  • Accessed via HTTP API (not mounted as a filesystem)
  • Virtually unlimited capacity
  • Use for: backups, static assets, log archives, media files
# Using s3cmd (open-source S3-compatible client)
$ sudo apt install -y s3cmd
$ s3cmd --configure

# Upload a file
$ s3cmd put backup.tar.gz s3://my-bucket/backups/

# List bucket contents
$ s3cmd ls s3://my-bucket/backups/

# Download a file
$ s3cmd get s3://my-bucket/backups/backup.tar.gz

Auto-Scaling Concepts

One of the most powerful cloud features is automatic scaling -- adding or removing instances based on demand:

┌──────────────────────────────────────────────────────────────┐
│                    AUTO-SCALING                               │
│                                                              │
│  Low traffic (night):                                        │
│  Load Balancer ──► [Instance 1] [Instance 2]                 │
│                    (2 instances, 15% CPU each)               │
│                                                              │
│  Normal traffic (day):                                       │
│  Load Balancer ──► [Inst 1] [Inst 2] [Inst 3] [Inst 4]      │
│                    (4 instances, 40% CPU each)               │
│                                                              │
│  Traffic spike (sale event):                                 │
│  Load Balancer ──► [1] [2] [3] [4] [5] [6] [7] [8]         │
│                    (8 instances, 60% CPU each)               │
│                                                              │
│  Scaling rules:                                              │
│  • Scale up when avg CPU > 70% for 5 minutes                │
│  • Scale down when avg CPU < 30% for 10 minutes             │
│  • Minimum: 2 instances (for redundancy)                     │
│  • Maximum: 20 instances (cost control)                      │
│                                                              │
└──────────────────────────────────────────────────────────────┘

For auto-scaling to work, your application must be stateless -- any instance can handle any request. Shared state goes in a database or cache, not on local disk.


Hands-On: Infrastructure with Terraform

Terraform (and its open-source fork OpenTofu) describes cloud infrastructure in code. Here is a taste of how it works.

Install Terraform/OpenTofu

# Install OpenTofu (open-source Terraform fork)
$ curl -fsSL https://get.opentofu.org/install-opentofu.sh | sudo bash -s -- --install-method standalone

# Or install Terraform
$ wget https://releases.hashicorp.com/terraform/1.7.0/terraform_1.7.0_linux_amd64.zip
$ unzip terraform_1.7.0_linux_amd64.zip
$ sudo mv terraform /usr/local/bin/

Terraform Basics

Terraform uses HCL (HashiCorp Configuration Language) to describe resources:

# main.tf -- Example infrastructure definition

# Configure the provider
terraform {
  required_providers {
    # This is an example -- specific providers vary by cloud
    libvirt = {
      source = "dmacvicar/libvirt"
    }
  }
}

# Define a virtual machine
resource "libvirt_domain" "web_server" {
  name   = "web-server-01"
  memory = "2048"
  vcpu   = 2

  disk {
    volume_id = libvirt_volume.web_disk.id
  }

  network_interface {
    network_name = "default"
  }

  cloudinit = libvirt_cloudinit_disk.web_init.id
}

# Define a cloud-init disk
resource "libvirt_cloudinit_disk" "web_init" {
  name      = "web-init.iso"
  user_data = <<-EOF
    #cloud-config
    hostname: web-server-01
    packages:
      - nginx
    runcmd:
      - systemctl enable --now nginx
  EOF
}

# Define the disk volume
resource "libvirt_volume" "web_disk" {
  name   = "web-disk.qcow2"
  pool   = "default"
  source = "/var/lib/libvirt/images/ubuntu-cloud.img"
  format = "qcow2"
}

The Terraform Workflow

# Initialize (download providers)
$ terraform init

# Preview changes
$ terraform plan

# Apply changes (create infrastructure)
$ terraform apply

# Destroy infrastructure when done
$ terraform destroy
┌──────────────────────────────────────────────────────────────┐
│                TERRAFORM WORKFLOW                             │
│                                                              │
│  terraform init ──► Download providers and modules           │
│         │                                                    │
│         ▼                                                    │
│  terraform plan ──► Show what will be created/changed        │
│         │                                                    │
│         ▼                                                    │
│  terraform apply ──► Create/modify infrastructure            │
│         │                                                    │
│         ▼                                                    │
│  terraform.tfstate ──► State file (tracks what exists)       │
│                                                              │
│  Later:                                                      │
│  terraform destroy ──► Remove all managed infrastructure     │
│                                                              │
└──────────────────────────────────────────────────────────────┘

The state file (terraform.tfstate) is critical -- it maps your configuration to real-world resources. In a team setting, store it in a shared backend (S3, Consul, etc.), never in Git.

Distro Note: Terraform/OpenTofu work identically regardless of your local Linux distribution. The provider plugins handle cloud-specific differences. You write the same HCL whether you are on Ubuntu, Fedora, or Arch.


Cloud CLI Tools

Each cloud platform has a CLI tool. For open-source and self-hosted clouds:

# OpenStack CLI
$ pip install python-openstackclient
$ openstack server list
$ openstack server create --flavor m1.small --image ubuntu-22.04 my-server

# Proxmox (via API)
$ curl -s https://proxmox:8006/api2/json/nodes/pve/qemu \
    -H "Authorization: PVEAPIToken=user@pam!token=uuid"

For working with cloud-compatible storage (S3-compatible APIs):

# MinIO client (open-source, works with any S3-compatible storage)
$ wget https://dl.min.io/client/mc/release/linux-amd64/mc
$ chmod +x mc && sudo mv mc /usr/local/bin/

# Configure a connection
$ mc alias set myminio http://minio-server:9000 ACCESS_KEY SECRET_KEY

# Basic operations
$ mc ls myminio/
$ mc mb myminio/my-bucket
$ mc cp file.txt myminio/my-bucket/

Debug This

A cloud instance launched with the following cloud-init user data, but nginx is not running and the deploy user was not created:

#cloud-config

users:
  - name: deploy
    groups: sudo
    ssh-authorized-keys:
      - ssh-rsa AAAA... deploy@laptop

packages:
  - nginx

runcmd:
  - systemctl enable nginx
  - systemctl start nginx

The cloud-init log shows: status: done with no errors. What went wrong?

Answer: The YAML key for SSH keys is wrong. It should be ssh_authorized_keys (underscores), not ssh-authorized-keys (hyphens). cloud-init silently ignores unknown keys. The deploy user was created but without SSH key access, and since there was no way to log in as deploy, it appeared the user was not created.

Also, runcmd should combine enable and start: systemctl enable --now nginx. If the package installation has not finished when runcmd executes, the service file might not exist yet. Adding package_update: true and listing packages before runcmd ensures proper ordering.

Always check /var/log/cloud-init-output.log for detailed diagnostics.


What Just Happened?

┌──────────────────────────────────────────────────────────────┐
│                    CHAPTER 73 RECAP                           │
│──────────────────────────────────────────────────────────────│
│                                                              │
│  Cloud computing lets you create Linux infrastructure        │
│  on demand, pay for what you use, and scale instantly.       │
│                                                              │
│  Key concepts:                                               │
│  • IaaS/PaaS/SaaS: how much the provider manages            │
│  • Cloud images: pre-built, minimal, instant-boot            │
│  • cloud-init: configures instances on first boot            │
│  • Metadata service (169.254.169.254): instance info         │
│  • Security groups: cloud-level firewall                     │
│  • Block storage: persistent virtual disks                   │
│  • Object storage: S3-compatible file storage                │
│  • Auto-scaling: add/remove instances based on demand        │
│                                                              │
│  Tools:                                                      │
│  • Terraform/OpenTofu: provision cloud resources as code     │
│  • cloud-init: initialize instances declaratively            │
│  • OpenStack, Proxmox: open-source cloud platforms           │
│                                                              │
│  Best practices:                                             │
│  • Use cloud-init for initial setup, IaC for ongoing config  │
│  • Layer security: security groups + OS firewall             │
│  • Separate persistent data to block/object storage          │
│  • Design for failure: instances can disappear               │
│                                                              │
└──────────────────────────────────────────────────────────────┘

Try This

Exercise 1: cloud-init Local Testing

You can test cloud-init configurations locally without a cloud account:

$ sudo apt install -y cloud-init
$ cloud-init devel schema --config-file your-config.yml

Write a cloud-init config that installs three packages, creates a user, and writes a custom /etc/motd. Validate it with the schema tool.

Exercise 2: Terraform Exploration

Install OpenTofu or Terraform and explore the CLI:

$ tofu version    # or terraform version
$ tofu providers  # List available providers

Write a simple .tf file (it does not need to connect to a real cloud). Run tofu init and tofu plan to see how Terraform processes your configuration.

Exercise 3: Metadata Service

If you have access to any cloud instance, query the metadata service and document all the information it provides. Think about which fields might be sensitive.

Exercise 4: Cloud Image Investigation

Download an Ubuntu or AlmaLinux cloud image. Mount it locally using qemu-nbd or guestmount and explore its contents. Compare the installed packages and filesystem size to a regular installation.

Bonus Challenge

Set up a local cloud using Proxmox VE or OpenStack DevStack in a VM. Create a Linux instance using a cloud image, pass it cloud-init user data, and verify the configuration was applied. Then manage the same infrastructure using Terraform with the appropriate provider.