Writing & Managing Services

Why This Matters

You have written an application -- maybe a Python web server, a Go API, or a Node.js background worker. It runs fine when you type the command in a terminal. But what happens when you log out? It dies. What happens when it crashes at 3 AM? It stays dead. What happens when the server reboots? Nobody starts it.

This is where writing your own systemd service unit comes in. A unit file is a simple text file that tells systemd how to start, stop, supervise, and restart your application. No shell script gymnastics. No screen sessions. No nohup hacks. Just a declarative configuration that systemd follows reliably, every time.

This chapter teaches you to write unit files from scratch, understand every directive that matters, configure restart policies, manage dependencies, and replace cron jobs with systemd timers.


Try This Right Now

Let us create a trivially simple service in under a minute:

# Create a tiny script
sudo tee /usr/local/bin/hello-service.sh << 'SCRIPT'
#!/bin/bash
while true; do
    echo "Hello from my service at $(date)"
    sleep 10
done
SCRIPT
sudo chmod +x /usr/local/bin/hello-service.sh

# Create a unit file for it
sudo tee /etc/systemd/system/hello.service << 'UNIT'
[Unit]
Description=My Hello Service

[Service]
ExecStart=/usr/local/bin/hello-service.sh

[Install]
WantedBy=multi-user.target
UNIT

# Load, start, and watch it
sudo systemctl daemon-reload
sudo systemctl start hello.service
journalctl -u hello.service -f

You should see "Hello from my service" messages appearing every 10 seconds. Press Ctrl+C to stop watching the log. The service keeps running.

# Clean up when done
sudo systemctl stop hello.service
sudo systemctl disable hello.service
sudo rm /etc/systemd/system/hello.service
sudo rm /usr/local/bin/hello-service.sh
sudo systemctl daemon-reload

Unit File Anatomy

Every systemd unit file has the same basic structure: sections (denoted by square brackets) containing key-value directives.

+---------------------------------------------------+
|  [Unit]           <-- Metadata & Dependencies     |
|  Description=...                                   |
|  After=...                                         |
|  Requires=...                                      |
|                                                    |
|  [Service]        <-- How to Run                  |
|  Type=...                                          |
|  ExecStart=...                                     |
|  Restart=...                                       |
|                                                    |
|  [Install]        <-- Boot Integration            |
|  WantedBy=...                                      |
+---------------------------------------------------+

The [Unit] Section

This section describes the unit and defines its relationships to other units.

[Unit]
Description=My Application Server
Documentation=https://example.com/docs
After=network.target postgresql.service
Requires=postgresql.service
Wants=redis.service
DirectivePurpose
Description=Human-readable name shown in systemctl status
Documentation=URL or man page reference
After=Start this unit after the listed units
Before=Start this unit before the listed units
Requires=Hard dependency -- if the required unit fails, this unit fails too
Wants=Soft dependency -- if the wanted unit fails, this unit still starts
BindsTo=Like Requires, but also stops this unit if the bound unit stops
Conflicts=Cannot run at the same time as the listed units
ConditionPathExists=Only start if the given path exists

After vs Requires: A Critical Distinction

These two directives do different things, and confusing them is a very common mistake:

  • After= controls ordering -- "start me after X has started"
  • Requires= controls dependency -- "if X fails, I fail too"

You almost always want both together:

# WRONG: ordering without dependency
After=postgresql.service
# PostgreSQL starts first, but if it fails, your app starts anyway

# WRONG: dependency without ordering
Requires=postgresql.service
# They might start at the same time (parallel), causing race conditions

# RIGHT: both together
After=postgresql.service
Requires=postgresql.service
# PostgreSQL starts first, AND your app won't start if PostgreSQL fails

Think About It: When would you use Wants= instead of Requires=? Think of a case where you would prefer your application to start even if an optional dependency failed.

The [Service] Section

This is where you define how the service actually runs.

[Service]
Type=simple
User=appuser
Group=appgroup
WorkingDirectory=/opt/myapp
Environment=NODE_ENV=production
EnvironmentFile=/opt/myapp/.env
ExecStartPre=/opt/myapp/check-config.sh
ExecStart=/opt/myapp/server --port 8080
ExecReload=/bin/kill -HUP $MAINPID
ExecStop=/bin/kill -TERM $MAINPID
Restart=on-failure
RestartSec=5
StandardOutput=journal
StandardError=journal

We will explore each important directive in detail below.

The [Install] Section

This section defines how the unit integrates with the boot process.

[Install]
WantedBy=multi-user.target
DirectivePurpose
WantedBy=When enabled, add this unit to the listed target's "wants"
RequiredBy=When enabled, add this unit to the listed target's "requires"
Also=When enabling this unit, also enable the listed units
Alias=Additional names for this unit

WantedBy=multi-user.target is the most common value. It means "start this service when the system reaches multi-user mode" -- which is the normal boot target for servers.


ExecStart, ExecStop, and ExecReload

ExecStart

The most important directive. This is the command that starts your service:

# Simple command
ExecStart=/usr/bin/python3 /opt/myapp/server.py

# With arguments
ExecStart=/usr/bin/node /opt/myapp/index.js --port 3000

# IMPORTANT: Must be an absolute path. This will NOT work:
# ExecStart=python3 server.py    <-- WRONG

Rules for ExecStart:

  • Must use an absolute path to the executable
  • For Type=simple and Type=forking, there can be only one ExecStart line
  • For Type=oneshot, you can have multiple ExecStart lines

ExecStartPre and ExecStartPost

Run commands before or after the main process starts:

ExecStartPre=/opt/myapp/validate-config.sh
ExecStart=/opt/myapp/server
ExecStartPost=/opt/myapp/notify-started.sh

Prefix with - to ignore failures:

# If this check fails, still start the service
ExecStartPre=-/opt/myapp/optional-check.sh

ExecStop

How to stop the service. If not specified, systemd sends SIGTERM (and then SIGKILL after a timeout):

# Custom graceful shutdown
ExecStop=/opt/myapp/graceful-shutdown.sh

ExecReload

What to do when systemctl reload is called. Typically sends SIGHUP:

ExecReload=/bin/kill -HUP $MAINPID

$MAINPID is a special variable systemd sets to the PID of the main process.


Service Types: Type=

The Type= directive tells systemd how your service starts and how to track its main process. Getting this wrong is one of the most common sources of service management bugs.

Type=simple (Default)

systemd considers the service "started" as soon as ExecStart runs. The process specified by ExecStart is the main process.

[Service]
Type=simple
ExecStart=/usr/bin/python3 /opt/myapp/server.py

Use when: your application runs in the foreground and does not fork.

Type=forking

For traditional daemons that fork a child process and then the parent exits. systemd considers the service started when the parent process exits.

[Service]
Type=forking
PIDFile=/var/run/myapp.pid
ExecStart=/opt/myapp/start.sh

Use when: your application daemonizes itself (forks into the background). You usually need PIDFile= so systemd can track the main process.

Type=oneshot

For services that do a single task and then exit. systemd waits for the process to finish before considering the unit "started."

[Service]
Type=oneshot
ExecStart=/opt/myapp/run-migration.sh
ExecStart=/opt/myapp/seed-database.sh
RemainAfterExit=yes

Use when: you need to run a setup task at boot (like loading firewall rules). With RemainAfterExit=yes, the unit shows as "active" even after the process exits.

Type=notify

The service sends a notification to systemd when it is ready. This is the most precise way to signal readiness.

[Service]
Type=notify
ExecStart=/opt/myapp/server

The application must call sd_notify(0, "READY=1") (using the systemd library) or write to the $NOTIFY_SOCKET. Many modern services support this (e.g., PostgreSQL, nginx with certain configurations).

Type=exec

Similar to simple, but systemd considers the service started only after the binary has been successfully executed (after the exec() system call). This catches cases where the binary does not exist or cannot be executed.

+---------------------------------------------------+
|  Type=simple   ->  Started immediately             |
|  Type=exec     ->  Started after exec() succeeds   |
|  Type=forking  ->  Started when parent exits       |
|  Type=oneshot  ->  Started when process exits      |
|  Type=notify   ->  Started when service signals    |
+---------------------------------------------------+

Think About It: You have an application that takes 30 seconds to warm up (loading data into memory, connecting to databases) before it can serve requests. Which Type would you choose, and why?


Restart Policies

One of systemd's most valuable features: automatic restart when a service crashes.

[Service]
Restart=on-failure
RestartSec=5

Restart= Options

ValueRestarts On
noNever restart (default)
on-successClean exit (exit code 0)
on-failureNon-zero exit code, signal, timeout, watchdog
on-abnormalSignal, timeout, watchdog (but NOT non-zero exit)
on-abortUnclean signal only
on-watchdogWatchdog timeout only
alwaysAlways restart, no matter what

For most services, you want either on-failure or always:

# Restart only on crashes (not on intentional stops)
Restart=on-failure
RestartSec=5

# Always restart (even after clean exit -- useful for workers)
Restart=always
RestartSec=5

Preventing Restart Loops

If a service is badly broken, you do not want systemd to restart it forever:

[Service]
Restart=on-failure
RestartSec=5
StartLimitIntervalSec=300
StartLimitBurst=5

This means: if the service fails 5 times within 300 seconds (5 minutes), stop trying. The service enters a "failed" state.

Distro Note: StartLimitIntervalSec and StartLimitBurst belong in the [Unit] section on older systemd versions (before 230). On modern systems, they work in either section, but [Unit] is more portable.


Hands-On: Writing a Custom Service

Let us write a proper service for a Python web application.

Step 1: Create the Application

sudo mkdir -p /opt/mywebapp
sudo tee /opt/mywebapp/app.py << 'PYTHON'
#!/usr/bin/env python3
"""A tiny HTTP server for demonstration."""
from http.server import HTTPServer, SimpleHTTPRequestHandler
import os
import signal
import sys

PORT = int(os.environ.get('PORT', 8080))

def graceful_shutdown(signum, frame):
    print(f"Received signal {signum}, shutting down gracefully...", flush=True)
    sys.exit(0)

signal.signal(signal.SIGTERM, graceful_shutdown)

print(f"Starting server on port {PORT}", flush=True)
server = HTTPServer(('', PORT), SimpleHTTPRequestHandler)
print(f"Server is ready and listening on port {PORT}", flush=True)
server.serve_forever()
PYTHON
sudo chmod +x /opt/mywebapp/app.py

Step 2: Create a Dedicated User

sudo useradd --system --no-create-home --shell /usr/sbin/nologin mywebapp

Step 3: Write the Unit File

sudo tee /etc/systemd/system/mywebapp.service << 'UNIT'
[Unit]
Description=My Python Web Application
After=network.target
Documentation=https://example.com/mywebapp

[Service]
Type=simple
User=mywebapp
Group=mywebapp
WorkingDirectory=/opt/mywebapp
Environment=PORT=8080
ExecStart=/usr/bin/python3 /opt/mywebapp/app.py
ExecReload=/bin/kill -HUP $MAINPID
Restart=on-failure
RestartSec=5
StartLimitIntervalSec=300
StartLimitBurst=5

# Security hardening
NoNewPrivileges=yes
ProtectSystem=strict
ProtectHome=yes
PrivateTmp=yes

StandardOutput=journal
StandardError=journal
SyslogIdentifier=mywebapp

[Install]
WantedBy=multi-user.target
UNIT

Step 4: Deploy and Start

# Reload systemd to pick up the new unit file
sudo systemctl daemon-reload

# Start and enable the service
sudo systemctl enable --now mywebapp.service

# Check status
systemctl status mywebapp.service

# Test it
curl http://localhost:8080/

Step 5: Test Restart Behavior

# Find the main PID
systemctl show mywebapp.service --property=MainPID
# MainPID=12345

# Kill it rudely (simulating a crash)
sudo kill -9 $(systemctl show mywebapp.service --property=MainPID --value)

# Wait a moment, then check -- it should have restarted
sleep 6
systemctl status mywebapp.service
# Notice the PID has changed and the service is active

Step 6: Check the Logs

# View all logs for this service
journalctl -u mywebapp.service --no-pager

# Follow logs in real time
journalctl -u mywebapp.service -f

Clean Up

sudo systemctl disable --now mywebapp.service
sudo rm /etc/systemd/system/mywebapp.service
sudo rm -rf /opt/mywebapp
sudo userdel mywebapp
sudo systemctl daemon-reload

Service Security Hardening

systemd provides powerful security directives that sandbox your service. Use them wherever possible:

[Service]
# Run as non-root
User=myapp
Group=myapp

# Cannot gain new privileges (e.g., via setuid binaries)
NoNewPrivileges=yes

# Make the entire filesystem read-only except specified paths
ProtectSystem=strict
ReadWritePaths=/var/lib/myapp /var/log/myapp

# Hide /home, /root, /run/user
ProtectHome=yes

# Private /tmp (isolated from other services)
PrivateTmp=yes

# Cannot modify kernel variables
ProtectKernelTunables=yes

# Cannot load kernel modules
ProtectKernelModules=yes

# Restrict network families
RestrictAddressFamilies=AF_INET AF_INET6 AF_UNIX

# Restrict system calls
SystemCallFilter=@system-service

You can check the security score of any service:

systemd-analyze security mywebapp.service

This gives each service a score from 0 (fully exposed) to 10 (fully locked down) and shows which hardening directives are missing.


Dependencies Deep Dive

Ordering: After= and Before=

These control when units start relative to each other:

# My app starts AFTER network and database are ready
After=network.target postgresql.service

# My app starts BEFORE the monitoring agent
Before=monitoring-agent.service

Dependency: Requires= and Wants=

These control whether units must succeed:

# Hard dependency: if PostgreSQL fails to start, my app fails too
Requires=postgresql.service

# Soft dependency: try to start Redis, but my app works without it
Wants=redis.service

Combining Them

A complete dependency setup:

[Unit]
Description=My Application
After=network.target postgresql.service redis.service
Requires=postgresql.service
Wants=redis.service

This means:

  1. Start after network, PostgreSQL, and Redis
  2. Fail if PostgreSQL is not running
  3. Continue even if Redis is not running

BindsTo=

Stronger than Requires. If the bound unit stops at any time (not just at startup), this unit also stops:

[Unit]
BindsTo=postgresql.service
After=postgresql.service

Conflicts=

Ensures two units never run simultaneously:

[Unit]
Conflicts=apache2.service

If you start this service, apache2.service is stopped automatically.


systemd Timers: The Modern Cron

systemd timers are a powerful replacement for cron jobs. They offer better logging, dependency management, and resource control.

Timer Anatomy

A timer requires two files:

  1. A .timer unit (the schedule)
  2. A .service unit (the actual work)

Example: Run a Backup Every Day at 2 AM

The service file (/etc/systemd/system/backup.service):

[Unit]
Description=Daily Backup Job

[Service]
Type=oneshot
ExecStart=/opt/scripts/backup.sh
User=backup
StandardOutput=journal

The timer file (/etc/systemd/system/backup.timer):

[Unit]
Description=Run Backup Daily at 2 AM

[Timer]
OnCalendar=*-*-* 02:00:00
Persistent=true
RandomizedDelaySec=300

[Install]
WantedBy=timers.target
# Enable the timer (not the service)
sudo systemctl daemon-reload
sudo systemctl enable --now backup.timer

# Check when it will fire next
systemctl list-timers backup.timer --no-pager

Timer Directives

DirectivePurpose
OnCalendar=Calendar-based schedule (like cron)
OnBootSec=Run X time after boot
OnUnitActiveSec=Run X time after the service last ran
OnStartupSec=Run X time after systemd started
Persistent=trueIf the system was off when the timer should have fired, run it at next boot
RandomizedDelaySec=Add random delay to prevent thundering herd
AccuracySec=How precise the timer needs to be

OnCalendar Syntax

The OnCalendar format is DayOfWeek Year-Month-Day Hour:Minute:Second:

OnCalendar=*-*-* 02:00:00          # Every day at 2:00 AM
OnCalendar=Mon *-*-* 09:00:00      # Every Monday at 9:00 AM
OnCalendar=*-*-01 00:00:00         # First day of every month
OnCalendar=*-01-01 00:00:00        # January 1st every year
OnCalendar=hourly                   # Every hour
OnCalendar=daily                    # Every day at midnight
OnCalendar=weekly                   # Every Monday at midnight
OnCalendar=*-*-* *:00:00           # Every hour on the hour
OnCalendar=*-*-* *:*:00            # Every minute
OnCalendar=*-*-* 08..17:00:00      # Every hour from 8 AM to 5 PM

Validate your schedule with systemd-analyze:

# When will this fire next?
systemd-analyze calendar "Mon *-*-* 09:00:00"
# Next elapse: Mon 2025-03-17 09:00:00 UTC

# How about every 15 minutes?
systemd-analyze calendar "*-*-* *:00/15:00"

Relative Timers

Instead of calendar-based, run relative to events:

[Timer]
# 15 minutes after boot
OnBootSec=15min

# Every 30 minutes after the service last ran
OnUnitActiveSec=30min

Socket Activation Basics

Socket activation lets systemd listen on a port and start the service only when a connection arrives. This means:

  • Services start on-demand, not at boot (faster boot)
  • If no one connects, the service never runs (saves resources)
  • systemd can restart a crashed service without losing queued connections

How Socket Activation Works

+----------+       +---------+       +----------+
| Client   | ----> | systemd | ----> | Service  |
| connects |       | (holds  |       | (started |
|          |       |  socket)|       |  on demand)|
+----------+       +---------+       +----------+

1. systemd creates the socket and listens
2. Client connects to the socket
3. systemd starts the service
4. systemd passes the socket file descriptor to the service
5. Service handles the connection

Example: Socket-Activated Service

Socket file (/etc/systemd/system/myapp.socket):

[Unit]
Description=My App Socket

[Socket]
ListenStream=8080
Accept=no

[Install]
WantedBy=sockets.target

Service file (/etc/systemd/system/myapp.service):

[Unit]
Description=My App Service
Requires=myapp.socket

[Service]
Type=simple
ExecStart=/opt/myapp/server
# Enable the socket (not the service directly)
sudo systemctl enable --now myapp.socket

# The service is not running yet
systemctl is-active myapp.service
# inactive

# Connect to the socket
curl http://localhost:8080/
# Now the service starts automatically

systemctl is-active myapp.service
# active

Debug This: Service Keeps Crashing

Your custom service starts but immediately dies, and systemd keeps restarting it:

● myapp.service - My Application
     Active: activating (auto-restart) (Result: exit-code)

The journal shows:

myapp.service: Main process exited, code=exited, status=1/FAILURE
myapp.service: Scheduled restart job, restart counter is at 4.

Here is your debugging checklist:

  1. Check the full journal output:

    journalctl -u myapp.service -n 100 --no-pager
    
  2. Run ExecStart manually to see errors directly:

    systemctl cat myapp.service | grep ExecStart
    # Then run that command as the same user:
    sudo -u appuser /opt/myapp/server
    
  3. Check that the binary exists and is executable:

    ls -la /opt/myapp/server
    file /opt/myapp/server
    
  4. Check that the user has correct permissions:

    sudo -u appuser ls -la /opt/myapp/
    sudo -u appuser cat /opt/myapp/config.yaml
    
  5. Check environment variables:

    systemctl show myapp.service --property=Environment
    
  6. Temporarily stop restart looping to debug:

    sudo systemctl stop myapp.service
    # Now you can investigate without it restarting
    

Where Unit Files Live

+------------------------------------------------------------------+
|  /usr/lib/systemd/system/     <- Vendor/package-provided units   |
|                                   (do NOT edit these directly)    |
|                                                                   |
|  /etc/systemd/system/         <- Admin-created units (your stuff)|
|                                   (this is where you create them)|
|                                                                   |
|  /run/systemd/system/         <- Runtime-only units              |
|                                   (disappear on reboot)          |
+------------------------------------------------------------------+

Priority: /etc > /run > /usr/lib

To override a vendor-provided unit without editing it directly:

# Create an override directory
sudo systemctl edit nginx.service
# This opens an editor for /etc/systemd/system/nginx.service.d/override.conf

Or manually:

sudo mkdir -p /etc/systemd/system/nginx.service.d/
sudo tee /etc/systemd/system/nginx.service.d/override.conf << 'OVERRIDE'
[Service]
LimitNOFILE=65535
OVERRIDE
sudo systemctl daemon-reload
sudo systemctl restart nginx

What Just Happened?

+------------------------------------------------------------------+
|                     CHAPTER 16 RECAP                              |
+------------------------------------------------------------------+
|                                                                  |
|  - Unit files have three sections: [Unit], [Service], [Install]  |
|  - ExecStart= must use absolute paths                           |
|  - Type= controls how systemd tracks your process:              |
|    simple, forking, oneshot, notify                              |
|  - Restart=on-failure with RestartSec= for automatic recovery   |
|  - After= controls order; Requires= controls dependency         |
|  - Use both together for proper dependency management            |
|  - systemd timers replace cron with OnCalendar= schedules       |
|  - Socket activation starts services on-demand                   |
|  - Put custom units in /etc/systemd/system/                     |
|  - Always run daemon-reload after editing unit files             |
|  - Security hardening: User=, ProtectSystem=, PrivateTmp=       |
|                                                                  |
+------------------------------------------------------------------+

Try This

Exercise 1: Write a One-Shot Service

Write a Type=oneshot service that creates a /tmp/system-booted file containing the current timestamp. Enable it so it runs at boot.

Exercise 2: Build a Timer

Create a systemd timer that runs a script every 15 minutes. The script should append the current date and system load average (uptime) to a log file. Verify the timer with systemctl list-timers.

Exercise 3: Dependency Chain

Create three services: service-a, service-b, and service-c. Configure them so that service-c requires service-b, which requires service-a. Verify the ordering:

sudo systemctl start service-c.service
# Should automatically start a and b first

Exercise 4: Crash Recovery

Create a service that intentionally exits with an error after 5 seconds. Set up Restart=on-failure with RestartSec=3. Watch it restart using journalctl -f. Then add StartLimitBurst=3 and StartLimitIntervalSec=60 and observe what happens after the third failure.

Bonus Challenge

Convert one of your existing cron jobs to a systemd timer. Compare the two approaches. Which gives you better logging? Which is easier to debug when something goes wrong?