Infrastructure as Code Concepts

Why This Matters

Picture this: your team manages 50 servers. A junior admin logs into one and tweaks the Nginx config. A senior admin fixes a firewall rule on another -- but only that one. Someone else installs a newer version of Python on a staging box "just to test." Six months later, nobody can explain why staging works but production does not. The servers that were supposed to be identical have quietly drifted apart, and nobody documented a thing.

This is configuration drift, and it has caused countless outages, security breaches, and sleepless nights. Infrastructure as Code (IaC) exists to solve this problem. Instead of logging into servers and making changes by hand, you describe your desired infrastructure in files, store those files in version control, and let tools apply them automatically.

The result? Every change is tracked. Every environment is reproducible. Every server is consistent. And when something breaks, you can see exactly what changed, when, and who approved it.

This chapter introduces the concepts behind IaC -- the philosophy, the approaches, and the major tools. The next chapters put those tools to work.

Try This Right Now

You do not need any IaC tool installed yet to see the core idea. Open a terminal and create a simple shell script that describes how a server should be configured:

$ mkdir -p ~/iac-demo && cat > ~/iac-demo/setup-webserver.sh << 'EOF'
#!/bin/bash
# Desired state: nginx installed, running, and serving a custom page

# Ensure nginx is installed
if ! command -v nginx &>/dev/null; then
    sudo apt-get update && sudo apt-get install -y nginx
fi

# Ensure the index page has our content
echo "<h1>Managed by IaC</h1>" | sudo tee /var/www/html/index.html > /dev/null

# Ensure nginx is running
sudo systemctl enable --now nginx

echo "Server configured successfully."
EOF
chmod +x ~/iac-demo/setup-webserver.sh

Now look at what you just did: you wrote a file that describes how a server should look. You can run it on any fresh Ubuntu box and get the same result. You can store it in Git. You can review changes before applying them. That is the essence of Infrastructure as Code -- even if real IaC tools are far more sophisticated than a shell script.

What Is Infrastructure as Code?

Infrastructure as Code is the practice of managing and provisioning computing infrastructure through machine-readable definition files rather than through manual processes.

┌─────────────────────────────────────────────────────────────────┐
│                     THE OLD WAY                                  │
│                                                                  │
│   Admin → SSH into server → Type commands → Hope it works        │
│   Admin → SSH into next server → Try to remember same commands   │
│   Admin → Forget to update the third server                      │
│   Result: Snowflake servers, drift, mystery configs              │
│                                                                  │
├─────────────────────────────────────────────────────────────────┤
│                     THE IaC WAY                                  │
│                                                                  │
│   Admin → Write config file → Commit to Git → Tool applies it   │
│   Same file → Applied to ALL servers → Identical state           │
│   Change needed? → Edit file → Review → Merge → Auto-apply      │
│   Result: Consistent, versioned, reproducible infrastructure     │
│                                                                  │
└─────────────────────────────────────────────────────────────────┘

The Core Principles

1. Everything is a file. Server configs, network rules, user accounts, installed packages -- all described in text files (YAML, JSON, HCL, or domain-specific formats).

2. Version control is the source of truth. Those files live in Git. The Git history is your change log. If production broke after the last commit, you can diff and see exactly why.

3. Automation replaces manual steps. No one logs into a server to make changes. Tools read your files and make the infrastructure match.

4. Environments are reproducible. Need a new staging environment? Apply the same files. Need to rebuild after a disaster? Apply the same files.

Declarative vs. Imperative Approaches

This is the most important conceptual distinction in IaC. Understanding it will shape how you think about every tool.

Imperative: "Here Are the Steps"

The imperative approach tells the tool how to reach the desired state, step by step. Think of it as a recipe:

1. Install nginx
2. Copy the config file to /etc/nginx/nginx.conf
3. Start nginx
4. Open port 80 in the firewall

Shell scripts are imperative. They say "do this, then this, then this." The problem? If you run the script twice, it might try to install nginx again (maybe that is fine, maybe it is not). If nginx is already running, trying to start it might produce a warning or error. You have to handle every edge case yourself.

Declarative: "Here Is What I Want"

The declarative approach tells the tool what the desired state is, and the tool figures out how to get there:

# Declarative: describe the desired state
web_server:
  package: nginx
  state: installed

  config_file:
    path: /etc/nginx/nginx.conf
    source: templates/nginx.conf

  service:
    state: running
    enabled: true

  firewall:
    port: 80
    state: open

Run it once, it installs and configures everything. Run it again -- nothing happens, because the system already matches the desired state. That property has a name: idempotency.

┌───────────────────────────────────────────────────────────────┐
│                   IMPERATIVE vs DECLARATIVE                    │
│                                                                │
│  IMPERATIVE (How)           │  DECLARATIVE (What)              │
│  ─────────────────          │  ──────────────────              │
│  "Install nginx"            │  "nginx should be installed"     │
│  "Start the service"        │  "service should be running"     │
│  "Create user bob"          │  "user bob should exist"         │
│                             │                                  │
│  You manage the steps       │  Tool manages the steps          │
│  You handle edge cases      │  Tool handles edge cases         │
│  Order matters              │  Order usually does not matter   │
│                             │                                  │
│  Examples:                  │  Examples:                       │
│  - Shell scripts            │  - Ansible playbooks             │
│  - Chef recipes             │  - Terraform configs             │
│  - Some Ansible ad-hoc      │  - Puppet manifests              │
│    commands                 │  - Kubernetes manifests          │
│                             │                                  │
└───────────────────────────────────────────────────────────────┘

Most modern IaC tools lean declarative, though some (like Ansible) blend both approaches.

Think About It: You have a script that creates a user account. If the user already exists, the script fails. How would you make this script idempotent? What would a declarative tool do differently?

Idempotency: The Most Important Word in IaC

An operation is idempotent if running it once produces the same result as running it multiple times. This is not just a nice-to-have -- it is fundamental to reliable automation.

Consider these two approaches:

# NOT idempotent -- fails on second run
useradd deploy

# Idempotent -- safe to run repeatedly
id deploy &>/dev/null || useradd deploy

# NOT idempotent -- appends duplicate line every run
echo "export PATH=/opt/app/bin:$PATH" >> /etc/profile

# Idempotent -- only adds if not present
grep -qxF 'export PATH=/opt/app/bin:$PATH' /etc/profile \
  || echo "export PATH=/opt/app/bin:$PATH" >> /etc/profile

IaC tools handle idempotency for you. When you write an Ansible task that says "ensure package nginx is installed," Ansible checks first and only installs if needed. When Terraform says "ensure this server exists," it checks its state file and only creates what is missing.

This is why IaC tools are better than raw scripts for managing infrastructure at scale. Writing truly idempotent shell scripts for complex configurations is surprisingly difficult. IaC tools have already solved those edge cases.

Configuration Management vs. Provisioning

IaC tools generally fall into two categories, and understanding the distinction helps you pick the right one.

Provisioning Tools

Provisioning tools create infrastructure: servers, networks, load balancers, DNS records, storage volumes. They answer the question: "What machines and resources should exist?"

┌─────────────────────────────────────────────────┐
│              PROVISIONING                        │
│                                                  │
│   "Create 3 VMs with 4 CPU, 8GB RAM each"       │
│   "Create a virtual network with subnet 10.0.1"  │
│   "Create a load balancer on port 443"           │
│   "Create a DNS record pointing to the LB"      │
│                                                  │
│   Tools: Terraform, OpenTofu, Pulumi             │
│                                                  │
└─────────────────────────────────────────────────┘

Configuration Management Tools

Configuration management tools configure existing machines: install packages, manage files, set up services, create users. They answer the question: "How should these machines be configured?"

┌─────────────────────────────────────────────────┐
│          CONFIGURATION MANAGEMENT                │
│                                                  │
│   "Install nginx on all web servers"             │
│   "Deploy this config file to /etc/nginx/"       │
│   "Ensure sshd is running and hardened"          │
│   "Create the deploy user with these SSH keys"   │
│                                                  │
│   Tools: Ansible, Puppet, Chef, Salt             │
│                                                  │
└─────────────────────────────────────────────────┘

They Work Together

In a real workflow, you often use both:

┌──────────────┐        ┌───────────────────┐        ┌────────────┐
│  Terraform   │ ────►  │   3 new servers    │ ────►  │  Ansible   │
│  provisions  │        │   are created      │        │  configures│
│  servers     │        │   (bare OS)        │        │  them all  │
└──────────────┘        └───────────────────┘        └────────────┘

Terraform creates the VMs; Ansible installs software and configures them. Some tools blur this line -- Ansible can do light provisioning, and Terraform can run scripts on new machines -- but the distinction remains useful.

Overview of Major IaC Tools

All of the following tools are open source. Each has different strengths, and understanding them will help you choose wisely.

Ansible

Type: Configuration management (with some provisioning)
Language: YAML (playbooks), Python (modules)
Approach: Declarative (mostly), push-based
Agent: None -- uses SSH
License: GPL v3

Ansible connects to machines over SSH and runs tasks. No agent software needs to be installed on the managed nodes. You write playbooks in YAML that describe the desired state, and Ansible modules do the work. It is the easiest IaC tool to start with and is covered in depth in Chapter 68.

Terraform / OpenTofu

Type: Provisioning
Language: HCL (HashiCorp Configuration Language)
Approach: Declarative
Agent: None -- uses provider APIs
License: OpenTofu is MPL 2.0 (Terraform changed to BSL in 2023; OpenTofu is the community fork)

Terraform (and its open-source fork OpenTofu) excels at provisioning cloud and on-premises infrastructure. You describe resources in .tf files, and Terraform calculates a plan showing exactly what will be created, modified, or destroyed before applying changes. It maintains a state file that maps your config to real-world resources.

Puppet

Type: Configuration management
Language: Puppet DSL (Ruby-like)
Approach: Declarative, pull-based
Agent: Yes -- requires Puppet agent on managed nodes
License: Apache 2.0

Puppet was one of the first modern configuration management tools. Managed nodes run the Puppet agent, which periodically pulls the desired configuration from a Puppet server and applies it. Puppet uses its own domain-specific language to describe resources (packages, files, services, users). It has a steep learning curve but scales well for very large environments.

Chef

Type: Configuration management
Language: Ruby (recipes and cookbooks)
Approach: Imperative (mostly), pull-based
Agent: Yes -- requires Chef client on managed nodes
License: Apache 2.0

Chef uses Ruby-based "recipes" grouped into "cookbooks." Nodes run the Chef client, which pulls recipes from a Chef server. Chef appeals to teams comfortable with Ruby and who prefer imperative step-by-step logic. Its community has shrunk compared to Ansible and Terraform, but it remains in use in many enterprises.

SaltStack (Salt)

Type: Configuration management and remote execution
Language: YAML (state files), Python (modules)
Approach: Declarative (states) and imperative (remote execution)
Agent: Optional -- can use agent (minion) or SSH
License: Apache 2.0

Salt is fast thanks to its ZeroMQ-based messaging. It supports both agent-based (minion) and agentless (salt-ssh) modes. Salt states are written in YAML and describe the desired configuration. It is particularly strong for large-scale environments and real-time remote execution.

Comparison at a Glance

┌────────────┬──────────┬────────────┬────────┬────────────────┐
│ Tool       │ Type     │ Approach   │ Agent? │ Language       │
├────────────┼──────────┼────────────┼────────┼────────────────┤
│ Ansible    │ Config   │ Declarative│ No     │ YAML           │
│ Terraform  │ Provision│ Declarative│ No     │ HCL            │
│ OpenTofu   │ Provision│ Declarative│ No     │ HCL            │
│ Puppet     │ Config   │ Declarative│ Yes    │ Puppet DSL     │
│ Chef       │ Config   │ Imperative │ Yes    │ Ruby           │
│ Salt       │ Config   │ Both       │ Opt.   │ YAML           │
└────────────┴──────────┴────────────┴────────┴────────────────┘

Think About It: Your team has 200 servers. You want to ensure every server has the same SSH hardening config. Would you reach for a provisioning tool or a configuration management tool? Why?

Push vs. Pull Models

IaC tools deliver changes in two ways:

Push Model

The control machine connects to each managed node and pushes changes. The admin initiates the action.

┌─────────────────┐
│  Control Machine │
│  (your laptop)   │
│                  │
│  ansible-playbook│──── SSH ────► Node 1  ✓
│  site.yml        │──── SSH ────► Node 2  ✓
│                  │──── SSH ────► Node 3  ✓
└─────────────────┘

Examples: Ansible, Salt (in SSH mode)

Pros: Simple, no agents to install, you control when changes happen. Cons: Must have network access to all nodes from the control machine.

Pull Model

Each managed node has an agent that periodically contacts a central server and pulls its configuration.

┌─────────────────┐
│  Central Server  │
│  (Puppet Server) │
│                  │◄──── Pull ──── Node 1 (agent)
│  Stores desired  │◄──── Pull ──── Node 2 (agent)
│  configurations  │◄──── Pull ──── Node 3 (agent)
└─────────────────┘
        Every 30 minutes, agents check in

Examples: Puppet, Chef, Salt (in minion mode)

Pros: Nodes self-correct automatically; works even if admin workstation is offline. Cons: Requires agent software on every node; more complex setup.

The GitOps Workflow

GitOps takes IaC to its logical conclusion: Git is the single source of truth for infrastructure, and all changes flow through Git.

┌──────────────────────────────────────────────────────────────┐
│                      GitOps Workflow                          │
│                                                              │
│  1. Developer writes IaC change                              │
│                    │                                         │
│                    ▼                                         │
│  2. Push to feature branch                                   │
│                    │                                         │
│                    ▼                                         │
│  3. Open Pull/Merge Request                                  │
│                    │                                         │
│                    ▼                                         │
│  4. CI runs: lint, validate, plan                            │
│     (e.g., "terraform plan" or "ansible --check")            │
│                    │                                         │
│                    ▼                                         │
│  5. Team reviews the change AND the plan output              │
│                    │                                         │
│                    ▼                                         │
│  6. Merge to main branch                                     │
│                    │                                         │
│                    ▼                                         │
│  7. CD pipeline applies changes automatically                │
│     (e.g., "terraform apply" or "ansible-playbook")          │
│                    │                                         │
│                    ▼                                         │
│  8. Infrastructure updated, state committed                  │
│                                                              │
└──────────────────────────────────────────────────────────────┘

Why GitOps Works

Audit trail: Every change is a commit with an author, timestamp, and message.
Code review: Infrastructure changes get the same review as application code.
Rollback: Revert a commit to undo a change.
Collaboration: Multiple people can work on infrastructure without stepping on each other.
Testing: CI pipelines can validate configs before they reach production.

Hands-On: Seeing Drift in Action

You do not need any IaC tool to understand drift. Let us simulate it.

Step 1: Create a "desired state" file:

$ mkdir -p ~/iac-demo/desired-state
$ cat > ~/iac-demo/desired-state/motd.txt << 'EOF'
Welcome to the production server.
Managed by IaC -- do not modify manually.
EOF

Step 2: "Deploy" it (simulate):

$ sudo cp ~/iac-demo/desired-state/motd.txt /etc/motd
$ cat /etc/motd
Welcome to the production server.
Managed by IaC -- do not modify manually.

Step 3: Simulate drift -- someone manually edits the file:

$ echo "Temporary fix by Bob - 2am" | sudo tee -a /etc/motd > /dev/null
$ cat /etc/motd
Welcome to the production server.
Managed by IaC -- do not modify manually.
Temporary fix by Bob - 2am

Step 4: Detect the drift:

$ diff ~/iac-demo/desired-state/motd.txt /etc/motd
2a3
> Temporary fix by Bob - 2am

Step 5: Remediate -- enforce the desired state:

$ sudo cp ~/iac-demo/desired-state/motd.txt /etc/motd
$ diff ~/iac-demo/desired-state/motd.txt /etc/motd

No output means the files match again. This detect-and-remediate cycle is exactly what IaC tools do automatically, at scale, across hundreds of machines.

IaC Best Practices

1. Store Everything in Version Control

Every IaC file belongs in Git. No exceptions. This includes:

Playbooks, manifests, and modules
Variable files (but NOT secrets -- see below)
Documentation about your infrastructure decisions

2. Never Store Secrets in Plain Text

Passwords, API keys, and private keys should never appear in your Git repository. Use tools designed for secrets:

ansible-vault for Ansible
Environment variables injected by CI/CD
External secret managers (HashiCorp Vault, etc.)

3. Use Modules and Roles for Reusability

Do not copy-paste configurations. Structure your code into reusable components:

Ansible: roles
Terraform: modules
Puppet: modules

4. Test Before Applying

Always preview changes before applying them:

ansible-playbook --check (dry run)
terraform plan (shows what will change)
puppet agent --noop (no-operation mode)

5. Make Small, Incremental Changes

Large changes are hard to review, hard to test, and hard to roll back. Make small, focused commits.

6. Use Environments (Dev, Staging, Production)

Test changes in dev, validate in staging, then promote to production. Use the same IaC code with different variable files for each environment.

7. Document the "Why," Not the "What"

Your IaC files already describe what the infrastructure looks like. Comments and commit messages should explain why decisions were made.

8. Enforce Code Review for Infrastructure Changes

Infrastructure changes should go through the same pull request process as application code. A second pair of eyes catches mistakes before they reach production.

Distro Note: IaC tools work across distributions, but package names and service names differ. Ansible handles this with ansible_os_family facts. Terraform does not care -- it works at the API level. When writing IaC for mixed environments, always test against each target distribution.

Debug This

Your colleague wrote this shell script to configure servers and claims it is "Infrastructure as Code":

#!/bin/bash
apt-get install nginx
echo "server { listen 80; root /var/www; }" > /etc/nginx/sites-available/default
service nginx restart
useradd deploy
echo "deploy:s3cret" | chpasswd

What problems can you identify? Think about:

Is it idempotent? What happens if you run it twice?
Is it secure? What is wrong with the last two lines?
Is it portable? Will it work on CentOS?
Is it version-controlled? How would you track changes?
What happens if apt-get install fails but the script continues?

Answers:

Not idempotent: useradd deploy fails on second run; the echo overwrites any manual Nginx config changes without checking.
A plaintext password in a script is a security disaster. If this script is in Git, the password is in the history forever.
Not portable: apt-get is Debian/Ubuntu only; service is deprecated on systemd systems.
The script itself can be in Git, but there is no structure for variables, roles, or environment separation.
No error handling: if apt-get fails, the script happily continues to configure a nonexistent Nginx.

What Just Happened?

┌──────────────────────────────────────────────────────────────┐
│                    CHAPTER 67 RECAP                           │
│──────────────────────────────────────────────────────────────│
│                                                              │
│  Infrastructure as Code = managing infrastructure through    │
│  version-controlled definition files, not manual changes.    │
│                                                              │
│  Key concepts:                                               │
│  • Declarative ("what I want") vs Imperative ("how to do")   │
│  • Idempotency: run it 10 times, same result as once         │
│  • Configuration management (Ansible, Puppet, Chef, Salt)    │
│    configures existing machines                              │
│  • Provisioning (Terraform, OpenTofu) creates resources      │
│  • Push model: control machine sends changes                 │
│  • Pull model: agents fetch changes periodically             │
│  • GitOps: Git as the single source of truth                 │
│                                                              │
│  Best practices:                                             │
│  • Version control everything                                │
│  • Never store secrets in plain text                         │
│  • Test before applying (dry run)                            │
│  • Small, incremental, reviewed changes                      │
│  • Use environments (dev → staging → production)             │
│                                                              │
└──────────────────────────────────────────────────────────────┘

Try This

Exercise 1: Write a Declarative Spec

Write a plain-text file (in YAML, JSON, or any format you like) that describes the desired state of a web server. Include: packages to install, files to create, services to enable, firewall rules to set. Do not worry about making it runnable -- focus on describing what the server should look like.

Exercise 2: Make a Script Idempotent

Take this non-idempotent script and rewrite it to be safe to run multiple times:

#!/bin/bash
mkdir /opt/myapp
useradd appuser
echo "export APP_HOME=/opt/myapp" >> /etc/profile

Hint: check before acting. Does the directory exist? Does the user exist? Is the line already in the file?

Exercise 3: Explore an IaC Tool

Install Ansible on your machine (covered in detail in Chapter 68) and run:

$ ansible localhost -m setup | head -50

This shows the "facts" Ansible gathers about your system -- the information it uses to make decisions. Notice how it detects your OS, package manager, network interfaces, and more.

Bonus Challenge

Set up a Git repository for infrastructure code. Create a directory structure like this:

infra/
├── inventory/
│   ├── dev
│   └── production
├── playbooks/
│   ├── webserver.yml
│   └── database.yml
├── roles/
│   └── common/
│       └── tasks/
│           └── main.yml
└── README.md

Even without writing any real Ansible code yet, this structure exercise teaches you how IaC projects are organized. You will fill it in during Chapter 68.

Linux Book: From First Boot to Production