Welcome to Embedded Systems Programming with STM32 and Rust

So you want to make hardware do things. Real, physical things — blink an LED, read a temperature sensor, drive a motor, talk to a satellite. You have come to the right place.

This book will take you from zero embedded experience to writing production-quality firmware in Rust on STM32 microcontrollers. No prior knowledge of electronics, embedded systems, or even Rust is assumed. If you can write a for loop in any language, you are ready.

What This Book Covers

We are going to build things. Lots of things. Along the way, you will learn:

  • Electronics fundamentals — just enough to not destroy your hardware
  • What a microcontroller actually is and how it works at the register level
  • The STM32 family — the most popular 32-bit microcontroller platform on the planet
  • Rust for embedded — why it is uniquely suited to firmware development
  • The Embassy async framework — modern, safe, productive embedded Rust
  • Every major peripheral — GPIO, UART, SPI, I2C, Timers, ADC, DAC, DMA, USB, CAN
  • Real-world projects — from blinking LEDs to wireless sensor networks

Which STM32 Chips Are Covered?

All of them. Well, almost all of them. This book covers the major STM32 families:

CategoryFamiliesCortex-M Core
Entry-LevelF0, G0, L0M0 / M0+
MainstreamF1, F3, G4M3 / M4F
WorkhorseF4M4F
High PerformanceF7, H7, H5M7 / M33
Ultra-Low-PowerL4, U5M4F / M33
WirelessWB, WLM4F / M0+

The examples primarily target the STM32F411 (a cheap, widely available board called the "Black Pill") and the STM32H743 (a high-performance beast), but every concept translates across the entire lineup. That is the beauty of STM32 — learn one, and you know them all.

Why Rust?

C has been the language of embedded systems for 40 years. It works. But it also gives you:

  • Buffer overflows that corrupt memory silently
  • Data races in interrupt handlers that happen once a week under a full moon
  • Null pointer dereferences that brick a device in the field
  • Use-after-free bugs that take months to track down

Rust eliminates entire categories of these bugs at compile time. Not at runtime. Not with a linter. At compile time, before your code ever touches hardware. The compiler is your co-pilot, and it has seen every crash you are about to write.

Why Embassy?

Embassy is an async runtime for embedded Rust. If you have used async/await in JavaScript, Python, or Rust on desktop, the mental model is the same — except Embassy runs on a microcontroller with no operating system, no heap allocation, and no standard library.

Here is what a complete Embassy program looks like:

#![no_std]
#![no_main]

use embassy_executor::Spawner;
use embassy_stm32::gpio::{Level, Output, Speed};
use embassy_time::Timer;
use defmt_rtt as _;
use panic_probe as _;

#[embassy_executor::main]
async fn main(_spawner: Spawner) {
    let p = embassy_stm32::init(Default::default());

    let mut led = Output::new(p.PC13, Level::High, Speed::Low);

    loop {
        led.set_high();
        Timer::after_millis(500).await;
        led.set_low();
        Timer::after_millis(500).await;
    }
}

That is a fully working blink program. No register manipulation, no unsafe blocks, no 200 lines of boilerplate. Just intent, clearly expressed.

What You Will Need

To follow along with this book, you will need:

  1. A computer — Windows, macOS, or Linux all work fine
  2. An STM32 development board — the STM32F411 "Black Pill" (around 300 INR) is our recommended starter board
  3. A USB cable — to connect the board to your computer
  4. A debugger/programmer — the ST-Link V2 clone (around 200 INR) or a DAPLink probe
  5. Basic components — LEDs, resistors, jumper wires, a breadboard (a starter kit for under 500 INR)
  6. Curiosity — the most important tool in this list

You do not need an electronics degree. You do not need to know what a capacitor does (yet). You do not need to have touched a soldering iron. Chapter 1 covers everything you need to know about electronics to get started safely.

How to Read This Book

This book is designed to be read sequentially. Each chapter builds on the previous one. Resist the urge to skip ahead to the SPI chapter before understanding GPIO — you will just end up coming back.

That said, once you have the fundamentals down (Chapters 1-6), the peripheral chapters (7+) can be read in any order that interests you.

Throughout the book, you will see callout boxes:

Fun Fact: These contain interesting tidbits about hardware, history, or the embedded industry. They are fun. Read them.

Think About It: These pose questions to deepen your understanding. Pause and actually think before reading on. Your future self debugging at 2 AM will thank you.

Warning: These highlight common mistakes that can damage hardware or waste hours of debugging. Read these twice.

What Embassy Gives You Over Bare-Metal

You might wonder why we use a framework at all. Why not just write to registers directly? Here is a comparison of reading a UART byte with and without Embassy:

#![allow(unused)]
fn main() {
// Without Embassy — manual register polling
unsafe {
    let usart2_sr = 0x4000_4400 as *const u32;
    let usart2_dr = 0x4000_4404 as *const u32;

    // Spin-wait until a byte arrives
    while core::ptr::read_volatile(usart2_sr) & (1 << 5) == 0 {}

    let byte = core::ptr::read_volatile(usart2_dr) as u8;
}

// With Embassy — async, safe, readable
let mut uart = Uart::new(p.USART2, p.PA3, p.PA2, Irqs, p.DMA1_CH5,
                          p.DMA1_CH6, uart_config);
let mut buf = [0u8; 1];
uart.read(&mut buf).await.unwrap();
let byte = buf[0];
}

Embassy provides:

  • Async/await — no busy-waiting, no manual interrupt juggling
  • Type safety — the compiler ensures you connect the right pins to the right peripherals
  • DMA integration — data transfers happen in hardware while your CPU does other work
  • Multi-tasking — run multiple concurrent tasks without an RTOS or threading bugs
  • Timers and delaysTimer::after_millis(100).await just works, accurately
  • Portable — same API across STM32, nRF, RP2040, and more

The Journey Ahead

Here is a roadmap of what lies ahead:

ChaptersTopicWhat You Will Build
1-3FoundationsUnderstanding electronics, MCUs, and the STM32 family
4-5Setup & First CodeToolchain setup, your first blink program
6-7GPIO & InterruptsButton inputs, LED control, external interrupts
8-9Timers & PWMPrecise timing, LED dimming, servo control
10-11UART & SerialPC communication, GPS modules, debug logging
12-13SPI & I2CDisplays, sensors, EEPROMs
14-15ADC & DACVoltage reading, analog output, signal generation
16-17DMA & PerformanceZero-copy transfers, efficient data handling
18-19USB & CANUSB devices, automotive communication
20+Advanced ProjectsWireless sensors, motor control, RTOS concepts

Each chapter follows the same pattern: concept explanation, register-level understanding, Embassy code, and a hands-on project you build and run on real hardware.

A Note on Mistakes

You will make mistakes. You will wire something backwards, forget a pull-up resistor, flash the wrong binary, and stare at a serial terminal printing garbage because your baud rate is off by one digit. This is normal. This is how learning works.

The good news: STM32 chips are surprisingly durable. Short of connecting 12V to a 3.3V pin, they will survive most beginner mistakes. And at 300 INR each, even the worst-case scenario is a cheap lesson.

Fun Fact: Every professional embedded engineer has a drawer of dead microcontrollers somewhere. It is a badge of honor. The only engineer who has never killed a chip is the one who has never built anything.

Let Us Begin

The microcontroller on your desk is a tiny computer. It has a processor, memory, and input/output pins — everything a computer needs. It just happens to run at 3.3 volts, fit on your thumbnail, and cost less than a cup of coffee.

By the end of this book, you will understand every transistor-level detail of how it works. But more importantly, you will be able to build things with it — reliably, safely, and in Rust.

Turn the page. Let us start with the basics.

Chapter 1: The Minimum You Need to Know Before Starting

You do not need an electrical engineering degree to program microcontrollers. But you do need to understand a few things about electricity, or you will destroy a chip. This chapter is your crash course.

Electricity in 5 Minutes

Three quantities govern everything:

QuantityUnitSymbolAnalogy
VoltageVolts (V)VWater pressure in a pipe
CurrentAmperes (A)IAmount of water flowing
ResistanceOhms (Ohm)RHow narrow the pipe is

They are related by Ohm's Law, the single most important equation in electronics:

V = I x R

If you know any two, you can calculate the third. You will use this constantly.

There is one more relationship you need — power:

P = V x I

Power is measured in Watts (W). It tells you how much energy a component consumes (and how much heat it generates). An STM32 running at full speed draws about 100 mA at 3.3V, which means P = 3.3 x 0.1 = 0.33 Watts. Barely warm to the touch.

Fun Fact: STM32 microcontrollers run at 3.3V. This is important. Not 5V like old Arduinos, not 1.8V like some ultra-low-power chips. 3.3 volts. Burn this number into your brain.

Analog vs. Digital

The physical world is analog — temperature varies smoothly, light dims gradually, sound is a continuous wave. Microcontrollers are digital — they think in ones and zeros.

A digital pin has two states:

StateVoltageMeaning
HIGH3.3VLogic 1, ON, True
LOW0V (GND)Logic 0, OFF, False

That is it. A GPIO pin is either 3.3V or 0V. There is no "sort of on."

So how do we bridge the gap?

  • ADC (Analog-to-Digital Converter) — reads a smooth analog voltage and converts it to a number. A 12-bit ADC on STM32 gives you values from 0 (0V) to 4095 (3.3V). This is how you read sensors.
  • PWM (Pulse Width Modulation) — fakes an analog output by switching a digital pin on and off very fast. If a pin is HIGH 50% of the time, a motor connected to it sees roughly half the voltage. This is how you dim LEDs and control motor speed.

Think About It: If your ADC reads a value of 2048, what voltage is that? (Hint: 2048 is exactly half of 4096, and your reference voltage is 3.3V.)

Components You Will Actually Use

Resistors

A resistor limits current flow. That is its only job. You will use them constantly for:

  • Current limiting — protecting LEDs and GPIO pins from drawing too much current
  • Pull-up / pull-down — giving floating pins a defined state (more on this later)
  • Voltage dividers — scaling voltages down to safe levels

Resistor values are marked with colored bands. You do not need to memorize the color code — just use a multimeter or look up a color band calculator online. Common values you will use: 220 Ohm, 1K Ohm, 4.7K Ohm, 10K Ohm, 100K Ohm.

Capacitors

Capacitors store small amounts of charge. In embedded systems, their most critical role is bypass (decoupling) capacitors.

Warning: Every STM32 chip requires 100nF ceramic capacitors between each VDD pin and GND. These are not optional. Without them, the chip will behave erratically — random resets, corrupted data, peripherals misbehaving. Every development board has them already soldered on. If you ever design your own PCB, this is rule number one.

LEDs

LEDs (Light Emitting Diodes) have polarity — they only work in one direction. The longer leg is the anode (positive), the shorter leg is the cathode (negative).

LEDs also require a current-limiting resistor. Without one, the LED will draw as much current as it can, overheat, and die — possibly taking your GPIO pin with it.

For a typical LED at 3.3V with a 2V forward voltage and 10mA desired current:

R = (3.3V - 2V) / 0.010A = 130 Ohm (use 150 Ohm or 220 Ohm, the nearest standard values)

Crystal Oscillators

These provide the clock signal that drives the microcontroller. Most STM32 boards have an 8 MHz crystal (HSE — High Speed External). The chip multiplies this internally using a PLL to reach its full speed (e.g., 8 MHz x 12 = 96 MHz on an F411).

Voltage Regulators

These convert one voltage to another. Your USB port provides 5V, but the STM32 needs 3.3V. A voltage regulator on the board handles this conversion. The popular AMS1117-3.3 is on almost every dev board.

How Not to Destroy Your Hardware

These rules are simple. Follow them and your chips will live long, productive lives.

  1. Never apply more than 3.3V to a GPIO pin. Some pins are 5V-tolerant (check the datasheet), but 3.3V is always safe.
  2. Never drive motors, relays, or solenoids directly from a GPIO pin. A GPIO pin can source about 20 mA. A small motor draws 200-500 mA. Use a transistor or motor driver.
  3. Always share GND. When connecting two devices, their ground pins must be connected together. No common ground, no communication.
  4. Always use current-limiting resistors with LEDs. The 10 seconds you save skipping the resistor is not worth the dead LED and possibly dead GPIO.

Voltage Dividers

A voltage divider uses two resistors to reduce a voltage proportionally:

V_in ---[R1]---+---[R2]--- GND
                |
              V_out

The formula:

V_out = V_in x R2 / (R1 + R2)

Practical Example: You want to measure a 12V car battery with your 3.3V ADC.

You need V_out = 3.3V when V_in = 12V:

3.3 = 12 x R2 / (R1 + R2)

Using R1 = 27K Ohm and R2 = 10K Ohm:

V_out = 12 x 10000 / (27000 + 10000) = 12 x 0.27 = 3.24V

Close enough, and safely under 3.3V even with some margin.

Think About It: What happens if R2 is much larger than R1? What if R1 is much larger than R2? Think about the formula and the extremes.

Pull-Up and Pull-Down Resistors

When a GPIO pin is configured as an input but nothing is connected to it, it is floating — its voltage drifts randomly between HIGH and LOW. This causes unpredictable behavior.

A pull-up resistor (typically 10K Ohm) connects the pin to 3.3V, giving it a default HIGH state. A pull-down resistor connects it to GND, giving it a default LOW.

Most STM32 pins have internal pull-up and pull-down resistors that you can enable in software:

#![allow(unused)]
fn main() {
use embassy_stm32::gpio::{Input, Pull};

// Enable internal pull-up — pin reads HIGH when nothing is connected
let button = Input::new(p.PA0, Pull::Up);

// Enable internal pull-down — pin reads LOW when nothing is connected
let sensor = Input::new(p.PB5, Pull::Down);
}

The internal pull-ups are weak (around 40K Ohm). For I2C communication, you must use external pull-up resistors (typically 4.7K Ohm on both SDA and SCL lines). The internal ones are too weak for I2C's open-drain signaling.

Binary, Hexadecimal, and Bits

Microcontroller registers are 32 bits wide. You need to be comfortable with binary and hexadecimal.

DecimalBinaryHex
000000x0
100010x1
200100x2
300110x3
401000x4
501010x5
601100x6
701110x7
810000x8
910010x9
1010100xA
1110110xB
1211000xC
1311010xD
1411100xE
1511110xF

Each hex digit represents exactly 4 bits. So a 32-bit register like 0x4001_0014 is really 32 binary digits grouped into 8 hex digits.

Rust makes working with these representations clean:

#![allow(unused)]
fn main() {
let binary_value = 0b0000_0000_0010_0000; // bit 5 set
let hex_value    = 0x0020;                 // same thing
let decimal      = 32;                     // same thing again

// Rust lets you use underscores for readability
let register_addr = 0x4002_0400; // GPIOB base address
let big_number    = 1_000_000;   // one million, easy to read
}

Fun Fact: The 0b prefix for binary literals is a Rust feature you will not find in standard C. It makes register manipulation dramatically more readable.

Bitwise Operations: The Language of Registers

Registers are controlled one bit at a time. Here are the four operations you will use constantly:

Set a Bit (Turn ON)

#![allow(unused)]
fn main() {
// Set bit 5 (turn on pin 5)
register |= 1 << 5;
// If register was 0b0000_0000, it becomes 0b0010_0000
}

Clear a Bit (Turn OFF)

#![allow(unused)]
fn main() {
// Clear bit 5 (turn off pin 5)
register &= !(1 << 5);
// If register was 0b0010_0000, it becomes 0b0000_0000
}

Check a Bit (Read state)

#![allow(unused)]
fn main() {
// Check if bit 5 is set
if register & (1 << 5) != 0 {
    // Bit 5 is HIGH
}
}

Toggle a Bit (Flip state)

#![allow(unused)]
fn main() {
// Toggle bit 5 — if it was ON, turn OFF; if OFF, turn ON
register ^= 1 << 5;
}

These four patterns — OR to set, AND-NOT to clear, AND to check, XOR to toggle — are the fundamental vocabulary of hardware programming. Embassy abstracts most of this away, but understanding what happens underneath will save you when debugging.

Think About It: Why do we use 1 << 5 instead of just writing 0b0010_0000? What if you need to set bit 13, or bit 27? Shifting is easier to read and less error-prone for arbitrary bit positions.

Summary

You now know enough electronics to safely connect components to an STM32 and enough binary math to understand what registers are doing. None of this needs to be memorized — you will internalize it through practice.

In the next chapter, we will look at what a microcontroller actually is, what is inside it, and why it is different from the computer you are reading this on.

Chapter 2: What Is a Microcontroller?

You are reading this on a computer that has a CPU, gigabytes of RAM, a hard drive, a GPU, a network card, and an operating system managing it all. A microcontroller is all of that squeezed onto a single chip the size of your fingernail — minus the luxury.

A Complete Computer on a Chip

A microcontroller (MCU) is a self-contained computer on a single integrated circuit. It has:

  • A CPU to execute instructions
  • Flash memory to store your program (survives power loss)
  • RAM to store variables while running
  • Input/Output peripherals to interact with the physical world

All of this costs between 50 and 500 INR, draws milliwatts of power, and fits in a package smaller than a postage stamp.

Fun Fact: There are more microcontrollers in your house than people on your street. Your washing machine, microwave, remote control, thermostat, elevator, car key fob, and electric toothbrush all contain at least one. A modern car has 50-100 of them.

How a Microcontroller Program Works

Every microcontroller program follows the same structure:

Power on
  → Initialize hardware (configure clocks, pins, peripherals)
  → Loop forever:
      1. Read inputs (buttons, sensors, communication)
      2. Process (make decisions, calculate)
      3. Write outputs (LEDs, motors, communication)

There is no operating system. No bootloader menu. No login screen. The moment power is applied, the chip starts executing your program from the first instruction in Flash memory. It runs until power is removed or the universe ends, whichever comes first.

Your program lives in Flash (non-volatile — it persists without power). Your variables live in RAM (volatile — they vanish the instant power is cut). This is why your microcontroller remembers its program after unplugging but starts with fresh variable values every boot.

In Embassy, this looks like:

#[embassy_executor::main]
async fn main(_spawner: Spawner) {
    // 1. Initialize hardware
    let p = embassy_stm32::init(Default::default());
    let mut led = Output::new(p.PC13, Level::High, Speed::Low);
    let button = Input::new(p.PA0, Pull::Up);

    // 2. Loop forever
    loop {
        // Read input
        if button.is_low() {
            // Process + Write output
            led.set_low(); // LED on (active low)
        } else {
            led.set_high(); // LED off
        }
        Timer::after_millis(10).await;
    }
}

What Is Inside an STM32

Let us open up the block diagram and look at the major components.

CPU Core: ARM Cortex-M

Every STM32 uses an ARM Cortex-M processor core. ARM does not make chips — they design the processor core and license it to manufacturers like STMicroelectronics. This is why STM32, nRF52, RP2040, and dozens of other chips all share the same instruction set.

Clock speeds range from 48 MHz on entry-level parts to 480 MHz on the H7 series. For context, the original IBM PC ran at 4.77 MHz. Even a "slow" microcontroller is an order of magnitude faster.

Flash Memory (64 KB - 2 MB)

This is where your compiled program lives. Flash is non-volatile (survives power cycles) but has limited write endurance — typically 10,000 write/erase cycles. You write to it when flashing firmware, not during normal operation.

RAM (8 KB - 1 MB)

This is your working memory for variables, buffers, and the stack. It is fast and has unlimited write cycles, but it loses everything when power is removed. Sounds small? A well-written embedded program can do remarkable things in 20 KB of RAM.

Peripherals

This is where it gets interesting. Peripherals are hardware blocks built into the chip that handle specific tasks:

PeripheralWhat It Does
GPIOGeneral Purpose I/O — read buttons, drive LEDs, toggle any pin
UARTSerial communication — talk to GPS modules, Bluetooth, PC terminal
SPIFast synchronous bus — displays, SD cards, flash memory
I2CTwo-wire bus — sensors, EEPROMs, many breakout boards
TimersCount time, generate PWM, capture pulse widths
ADCAnalog-to-Digital — read voltages from sensors
DMADirect Memory Access — move data without CPU involvement
USBUSB device/host — your chip can be a USB keyboard or mass storage
CANController Area Network — automotive/industrial communication

Each peripheral operates independently of the CPU. While your code is processing sensor data, the UART can be receiving bytes, the DMA can be moving ADC samples into RAM, and a timer can be generating PWM — all simultaneously, all in hardware.

The Bus System

Peripherals connect to the CPU through buses — digital highways for data:

  • AHB (Advanced High-performance Bus) — fast bus for GPIO, DMA, memory
  • APB1 (Advanced Peripheral Bus 1) — slower bus for UART, I2C, basic timers
  • APB2 — slightly faster peripheral bus for SPI1, ADC, advanced timers

This matters because each bus has a maximum clock speed. On the STM32F411, APB1 maxes out at 50 MHz while APB2 can run at 100 MHz. Peripherals on APB2 can operate faster.

The Clock System

The clock system is the heartbeat of the microcontroller. It determines how fast everything runs. A typical STM32 clock tree:

8 MHz HSE crystal
    → PLL (Phase-Locked Loop) multiplies to 96 MHz
        → SYSCLK = 96 MHz (CPU runs at this speed)
        → AHB = 96 MHz (GPIO, DMA)
        → APB1 = 48 MHz (UART, I2C, basic timers)
        → APB2 = 96 MHz (SPI, ADC, advanced timers)

Embassy configures all of this for you, but understanding it helps when debugging timing issues.

The Cortex-M Family

Not all ARM cores are equal. Here is what you will encounter across the STM32 lineup:

CoreFeaturesSTM32 FamiliesTypical Clock
Cortex-M0 / M0+Basic, low power, no FPUF0, G0, L048-64 MHz
Cortex-M3Bit-banding, no FPUF172 MHz
Cortex-M4FDSP instructions, single-precision FPUF3, F4, L4, G4, WB72-170 MHz
Cortex-M7Double-precision FPU, I/D cache, branch predictionF7, H7216-480 MHz
Cortex-M33TrustZone security, FPU, DSPL5, U5, H5110-250 MHz

Why the FPU Matters

FPU stands for Floating Point Unit — dedicated hardware for decimal math. Without an FPU, calculating 3.14 * 2.0 requires dozens of integer instructions emulating floating-point arithmetic. With an FPU, it is a single instruction.

The difference is dramatic: 10x to 50x faster for floating-point operations. If your project involves sensor math, PID control loops, audio processing, or anything with decimal numbers, choose a chip with an FPU (Cortex-M4F or higher).

Think About It: A Cortex-M0 running at 48 MHz without an FPU might be slower at floating-point math than a Cortex-M4F running at 48 MHz. Raw clock speed is not the whole story.

Memory-Mapped I/O: THE Key Concept

This is the single most important concept in embedded programming. Read this section twice.

On your desktop computer, hardware is accessed through drivers and operating system calls. On a microcontroller, hardware is accessed by reading and writing specific memory addresses.

Every peripheral, every register, every control bit lives at a fixed address in the chip's memory map:

Address RangeWhat Lives There
0x0000_0000 - 0x0007_FFFFFlash memory (your program)
0x2000_0000 - 0x2001_FFFFRAM (your variables)
0x4002_0000 - 0x4002_03FFGPIOA registers
0x4000_4400 - 0x4000_47FFUSART2 registers
0x4001_3000 - 0x4001_33FFSPI1 registers

When you write a value to address 0x4002_0014, you are not writing to memory. You are changing the physical voltage on the GPIOA output pins. That is memory-mapped I/O: the hardware pretends to be memory, and writing to those "memory locations" controls real-world electrical signals.

Here is what toggling pin PA5 looks like in raw register manipulation (C-style), versus Embassy:

#![allow(unused)]
fn main() {
// Raw register access (what happens underneath)
// Set bit 5 of GPIOA ODR to turn on PA5
unsafe {
    let gpioa_odr = 0x4002_0014 as *mut u32;
    let current = core::ptr::read_volatile(gpioa_odr);
    core::ptr::write_volatile(gpioa_odr, current | (1 << 5));
}

// Embassy (what you actually write)
let mut pin = Output::new(p.PA5, Level::Low, Speed::Low);
pin.set_high();
}

Both do the exact same thing. Embassy just wraps the horror in a safe, readable API.

Registers: The 32-Bit Control Panel

A register is a 32-bit value at a fixed address that controls or reports the state of a peripheral. Think of it as a row of 32 tiny switches.

For example, the GPIO Output Data Register (ODR) for GPIOA:

Bit:  31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16
      [            Reserved (read as 0)                ]

Bit:  15 14 13 12 11 10  9  8  7  6  5  4  3  2  1  0
      P15 P14 P13 P12 P11 P10 P9 P8 P7 P6 P5 P4 P3 P2 P1 P0

Each bit directly controls one pin. Bit 5 = PA5. Set it to 1 and the pin goes to 3.3V. Set it to 0 and the pin goes to 0V.

Registers come in three flavors:

  • Control registers — you write to configure behavior (e.g., set a pin as input or output)
  • Status registers — you read to check what happened (e.g., has a UART byte arrived?)
  • Data registers — you read/write to exchange data (e.g., the byte to transmit via UART)

The Toolchain

You cannot compile code on your PC and run it on an STM32 — they have completely different processors. You need a cross-compiler: a tool that runs on your PC but produces code for the ARM Cortex-M.

The Rust embedded toolchain consists of:

ComponentPurpose
rustup target add thumbv7em-none-eabihfInstall the cross-compilation target
probe-rsFlash firmware and debug via ST-Link/DAPLink
defmt + defmt-rttLightweight logging over the debug probe
cargo embed or cargo run (with probe-rs)Build, flash, and run in one command

The target triple thumbv7em-none-eabihf breaks down as:

  • thumb — ARM Thumb instruction set
  • v7em — ARMv7E-M architecture (Cortex-M4/M7)
  • none — no operating system
  • eabi — Embedded ABI (calling convention)
  • hf — hardware floating point

What Makes Embedded Different

If you are coming from desktop or web development, embedded is a different world:

DesktopEmbedded
Operating system manages everythingNo OS — your code is the only thing running
Print to terminal with println!No terminal — use defmt over debug probe or UART
Program exits when doneRuns forevermain must never return
Timing is "fast enough"Real-time — a 1 ms deadline means 1 ms, not "roughly 1 ms"
Bugs crash a programBugs affect physical hardware — wrong output can damage circuits
Gigabytes of RAMKilobytes of RAM — every byte matters

Fun Fact: When NASA's Voyager 1 was reprogrammed in 2023, engineers uploaded new code to a computer with 69 KB of memory — less RAM than most STM32s. It has been running since 1977, 15 billion miles from Earth. That is embedded programming at its finest.

Summary

A microcontroller is a self-contained computer on a chip. It has a CPU, Flash, RAM, and peripherals — all controlled by reading and writing to specific memory addresses. The STM32 family uses ARM Cortex-M cores ranging from the basic M0 to the powerful M7, all programmed using the same Rust toolchain.

The key insight is memory-mapped I/O: hardware registers live at fixed addresses, and writing to those addresses controls physical pins and peripherals. Embassy abstracts this into safe Rust APIs, but the registers are always there underneath.

Next, we will explore the STM32 family in detail — which chip to choose, what the part numbers mean, and how to navigate the (enormous) datasheet.

Chapter 3: The STM32 Family

You have decided to use an STM32. Good choice — you are now part of the largest 32-bit microcontroller ecosystem on the planet. But with over 1,000 part numbers across a dozen families, choosing the right chip can feel overwhelming. This chapter is your map.

Why STM32?

STMicroelectronics ships over 2 billion STM32 units per year. That scale matters because it means:

  • Price: 50-500 INR per chip, even in single quantities
  • Availability: stocked by every major distributor, always in production
  • Consistent API: learn the peripheral registers on one STM32 and they are nearly identical across the entire lineup
  • Documentation: the best datasheets and reference manuals in the industry, all free
  • Community: the largest ecosystem of libraries, forums, tutorials, and example code for any MCU family
  • Embassy support: first-class async Rust support with embassy-stm32

Fun Fact: STM32 chips are inside products from Samsung, Dyson, DJI, Tesla, GoPro, and thousands of other companies. When you learn STM32, you are learning the same platform that professional engineers use in shipping products.

The Naming System: Decoding a Part Number

Every STM32 part number tells you exactly what the chip is. Let us decode STM32H743VIT6:

SegmentValueMeaning
STM32STM32STMicroelectronics 32-bit MCU
HHFamily: High Performance
77Series: Cortex-M7, 480 MHz
4343Sub-family: full peripheral set
VVPin count: 100 pins (LQFP100)
IIFlash size: 2 MB
TTPackage: LQFP (Thin Quad Flat Pack)
66Temperature range: -40 to +85 C (industrial)

Common pin count codes: C = 48 pins, R = 64 pins, V = 100 pins, Z = 144 pins, A = 169 pins.

Common Flash size codes: 8 = 64 KB, B = 128 KB, C = 256 KB, E = 512 KB, G = 1 MB, I = 2 MB.

Once you learn this system, you can glance at any STM32 part number and immediately know its class, capabilities, and package.

Think About It: If you see STM32F411CEU6, what can you tell? F4 family (Cortex-M4F), 11 series, 48-pin count (C), 512 KB Flash (E), UFQFPN package (U), industrial temperature (6). That is the "Black Pill" — the board we recommend for starting out.

The Complete Series Guide

Entry-Level: When Simple Is Enough

FamilyCoreMax ClockFlashRAMPrice RangeBest For
F0Cortex-M048 MHz16-256 KB4-32 KB50-150 INRSimple control, cost-sensitive
G0Cortex-M0+64 MHz16-512 KB8-144 KB50-150 INRModern F0 replacement, better peripherals
L0Cortex-M0+32 MHz8-192 KB2-20 KB60-180 INRUltra-low-power battery devices

These are the chips you use when you need to toggle a relay, read a temperature sensor, or drive a simple display — and you need to do it for the absolute minimum cost. No FPU, modest resources, but often all you need.

The G0 is the modern pick in this category. It has USB-C support, better ADCs, and more RAM than the aging F0, at the same price.

Mainstream: The Sweet Spot

FamilyCoreMax ClockFlashRAMPrice RangeBest For
F1Cortex-M372 MHz16 KB - 1 MB6-96 KB80-250 INRLegacy designs, huge code base
F3Cortex-M4F72 MHz16-512 KB16-80 KB100-300 INRMixed-signal, motor control
G4Cortex-M4F170 MHz32 KB - 1 MB16-128 KB120-350 INRModern F3 replacement, math-heavy

The F1 is the granddaddy — the original STM32 from 2007. Millions of designs use it. It lacks an FPU but remains popular for simple applications.

The G4 is the modern choice here. At 170 MHz with an FPU, DSP instructions, hardware math accelerators (CORDIC and FMAC), and advanced timers, it is a powerhouse for motor control, power conversion, and signal processing.

Workhorse: The One Everyone Knows

FamilyCoreMax ClockFlashRAMPrice RangeBest For
F4Cortex-M4F100-180 MHz64 KB - 2 MB64-384 KB150-400 INRGeneral purpose, learning, prototyping

The STM32F4 is the most popular STM32 family. Period. It is fast enough for most applications, cheap enough for hobby projects, and has the largest collection of tutorials, libraries, and community support.

The STM32F411 "Black Pill" board — available for under 300 INR on AliExpress — is our recommended starting board. It runs at 100 MHz, has 512 KB Flash, 128 KB RAM, and enough peripherals for anything a beginner will encounter.

Fun Fact: The STM32F4 was used in the original PX4 flight controller for drones. Many open-source autopilot designs still use it today.

High Performance: When You Need Speed

FamilyCoreMax ClockFlashRAMPrice RangeBest For
F7Cortex-M7216 MHz64 KB - 2 MB64-512 KB300-600 INRGraphics, networking, complex DSP
H7Cortex-M7480 MHz128 KB - 2 MB128 KB - 1 MB350-800 INRMaximum performance, dual-core
H5Cortex-M33250 MHz128 KB - 2 MB256-640 KB250-500 INRModern high-perf with TrustZone

The H7 is the flagship. At 480 MHz with double-precision FPU, L1 cache, and up to 1 MB of RAM, it can run a graphical user interface, process audio in real time, or serve as the brain of a robot — all without an operating system.

Some H7 variants are dual-core, with a Cortex-M7 and a Cortex-M4 on the same chip. The M7 handles heavy processing while the M4 manages real-time I/O.

Ultra-Low-Power: Years on a Battery

FamilyCoreMax ClockFlashRAMPrice RangeBest For
L4Cortex-M4F80 MHz64 KB - 1 MB40-320 KB150-400 INRBattery sensors, wearables, IoT
U5Cortex-M33160 MHz256 KB - 4 MB256-2560 KB200-500 INRNext-gen ultra-low-power, TrustZone

The L4 draws as little as 33 nA in shutdown mode. At that current, a CR2032 coin cell battery (230 mAh) would last over 800 years in standby. In practice, an L4-based sensor node that wakes every few minutes to take a measurement and go back to sleep can run for 5-10 years on a single battery.

The U5 is the modern successor with the Cortex-M33 core, TrustZone security, and even more RAM. It is ideal for secure IoT applications.

Wireless: Built-In Radio

FamilyCoreRadioBest For
WBCortex-M4F + M0+Bluetooth 5.0, Zigbee, ThreadShort-range wireless, mesh networks
WLCortex-M4 + M0+LoRa, SigfoxLong-range (km) low-power IoT

The WB series has a full Bluetooth 5.0 stack running on the M0+ core while your application runs on the M4F. No external radio module needed.

The WL series integrates a LoRa radio — capable of communication over several kilometers at extremely low power. Perfect for agricultural sensors, city infrastructure, and remote monitoring.

Choosing the Right Chip

If you are staring at this table wondering where to start, here is a simple decision matrix:

Your SituationChoose This
Learning embedded for the first timeSTM32F411 (Black Pill)
Cost-sensitive production designSTM32G030 or STM32G070
Battery-powered IoT sensorSTM32L476 or STM32U575
Motor control or power electronicsSTM32G431
Graphics, audio, or heavy DSPSTM32H743
Bluetooth or Zigbee neededSTM32WB55
Long-range wireless (LoRa)STM32WL55
Maximum security (TrustZone)STM32U575 or STM32H563

Think About It: The "best" chip is not the most powerful one — it is the cheapest one that meets your requirements. A $0.50 G0 running a simple thermostat is a better engineering decision than a $5 H7 doing the same job.

STM32H743: A Closer Look

Let us spotlight the STM32H743 to see what a high-end STM32 offers:

  • CPU: ARM Cortex-M7 at 480 MHz with double-precision FPU
  • Flash: 2 MB (dual-bank, allowing read-while-write)
  • RAM: 1 MB total (including 128 KB DTCM at zero wait states)
  • Instruction/Data cache: 16 KB each (L1 cache, just like a desktop CPU)
  • ADC: 3x 16-bit ADCs at 3.6 Msps (mega-samples per second)
  • DAC: 2x 12-bit DACs
  • Timers: 22 timers including advanced motor control timers
  • Communication: 4x UART, 6x SPI, 4x I2C, USB OTG HS, 2x CAN-FD, Ethernet MAC
  • DMA: 2x DMA controllers with 16 streams each, plus MDMA and BDMA
  • Other: JPEG codec, Chrom-ART GPU, true random number generator, CRC hardware, AES/DES crypto accelerator

This is a staggering amount of hardware for a chip that costs under 800 INR. For perspective, this single chip has more processing power and more peripherals than the entire computer that guided Apollo 11 to the moon.

Cross-Series Development with Embassy

This is where the STM32 + Embassy combination truly shines. Moving your code from one STM32 family to another requires changing three things:

  1. The chip feature in Cargo.toml
  2. The compilation target
  3. The pin names (because different packages have different pinouts)

The logic stays the same.

// This blink code works on ANY STM32 — just change the pin name

#[embassy_executor::main]
async fn main(_spawner: Spawner) {
    let p = embassy_stm32::init(Default::default());

    // On Black Pill F411: PC13
    // On Nucleo F446RE: PA5
    // On Nucleo H743ZI: PB0
    let mut led = Output::new(p.PC13, Level::High, Speed::Low);

    loop {
        led.toggle();
        Timer::after_millis(500).await;
    }
}

And in Cargo.toml:

# For STM32F411:
embassy-stm32 = { version = "0.1", features = ["stm32f411ce", "time-driver-any"] }
# target: thumbv7em-none-eabihf

# For STM32H743:
# embassy-stm32 = { version = "0.1", features = ["stm32h743vi", "time-driver-any"] }
# target: thumbv7em-none-eabihf

# For STM32G030:
# embassy-stm32 = { version = "0.1", features = ["stm32g030f6", "time-driver-any"] }
# target: thumbv6m-none-eabi

Notice how the Embassy API is identical. Output::new, led.toggle(), Timer::after_millis — these work across all STM32 families. The Embassy HAL (Hardware Abstraction Layer) translates your intent into the correct register writes for whichever chip you are targeting.

The Reference Manual: Your 3,300-Page Best Friend

Every STM32 family has a Reference Manual (RM) published by STMicroelectronics. The H7's reference manual, RM0433, is over 3,300 pages long.

Do not try to read it cover to cover. Use it as a reference:

  • Need to configure the ADC? Search for "ADC" in the PDF and read that chapter.
  • Timer not behaving? Check the timer register descriptions.
  • Weird peripheral behavior? The RM has timing diagrams, state machines, and detailed explanations for every peripheral.

The reference manual is the ultimate source of truth. Tutorials, Stack Overflow answers, and blog posts can be wrong or outdated. The RM is written by the engineers who designed the silicon. Bookmark it. You will reference it hundreds of times.

Fun Fact: The RM0433 for the STM32H7 series, if printed, would be a stack of paper about 40 cm tall. Nobody has read all 3,300 pages. But everyone has read the 50 pages that matter for their project.

Summary

The STM32 family spans from ultra-cheap entry-level chips to 480 MHz dual-core powerhouses, all sharing a consistent peripheral API and supported by Embassy's unified Rust HAL. The naming system tells you everything about a chip at a glance. For learning, start with the STM32F411 Black Pill. For production, choose the cheapest chip that meets your requirements.

The reference manual is your most important resource. It is enormous, but you only need to read the chapters relevant to the peripherals you are using. Treat it like a dictionary, not a novel.

In the next chapter, we will set up the Rust embedded development environment and get your first program running on real hardware.

Setting Up Your Development Environment

GPIO — General Purpose Input/Output

GPIO is how your microcontroller touches the physical world. Every LED you blink, every button you read, every sensor you talk to — it all starts with GPIO pins. Think of them as tiny, programmable electrical switches that you control with code.

What Exactly Is a GPIO Pin?

An STM32 chip has dozens (sometimes over a hundred) of metal pins sticking out of it. Many of these are General Purpose Input/Output pins — meaning you decide whether they send signals out or listen for signals coming in.

These pins are organized into ports, labeled GPIOA through GPIOK (depending on your chip). Each port holds up to 16 pins, numbered 0 through 15. So when someone says PA5, they mean Port A, Pin 5. PD12 is Port D, Pin 12. Simple as that.

💡 Fun Fact: The STM32H743 has 11 GPIO ports (A through K), giving it up to 176 GPIO pins. Your laptop's CPU has thousands of pins too, but you never touch them directly — that's what makes embedded programming special.

Pin Modes Explained

Every GPIO pin can be configured into one of several modes. Choosing the right mode is one of the first decisions you make when writing embedded code.

ModeDirectionDescriptionTypical Use
Input FloatingInPin reads whatever voltage is present, no internal biasSignals from other ICs with defined output
Input Pull-UpInInternal resistor pulls pin HIGH when nothing is connectedButtons that connect to GND when pressed
Input Pull-DownInInternal resistor pulls pin LOW when nothing is connectedButtons that connect to VCC when pressed
Output Push-PullOutPin actively drives HIGH or LOWLEDs, digital control signals
Output Open-DrainOutPin can pull LOW or float (needs external pull-up)I2C bus, level shifting, shared signal lines
Alternate FunctionSpecialPin is controlled by a peripheral (UART, SPI, I2C, etc.)Serial communication, PWM output
AnalogSpecialPin connects directly to ADC/DAC, digital logic disabledReading sensor voltages, audio output

🧠 Think About It: Why does "Input Floating" exist if it's unreliable? Because when another chip is actively driving the line HIGH or LOW, you don't want your internal pull resistor fighting it. Floating input is the polite listener — it doesn't impose its own opinion.

Output Speed

When you configure a pin as output, you also choose its speed — which controls how fast the voltage can transition between LOW and HIGH.

SpeedEdge RateWhen to Use
Low~2 MHzLEDs, relays, anything slow
Medium~12.5 MHzGeneral purpose signals
High~50 MHzSPI, SDIO
Very High~100 MHzHigh-speed interfaces, SDRAM

Higher speed means the pin switches faster, which is necessary for high-frequency communication. But faster edges also generate more electromagnetic noise and consume more power. Always pick the lowest speed that works for your application.

Blinking an LED with Embassy

Let's start with the embedded equivalent of "Hello, World" — blinking an LED.

#![no_std]
#![no_main]

use embassy_executor::Spawner;
use embassy_stm32::gpio::{Level, Output, Speed};
use embassy_time::Timer;
use {defmt_rtt as _, panic_probe as _};

#[embassy_executor::main]
async fn main(_spawner: Spawner) {
    let p = embassy_stm32::init(Default::default());

    // Configure PE3 as push-pull output, starting LOW
    let mut led = Output::new(p.PE3, Level::Low, Speed::Low);

    loop {
        led.set_high();
        Timer::after_millis(500).await;
        led.set_low();
        Timer::after_millis(500).await;
    }
}

That's it. No register manipulation, no volatile pointers, no unsafe blocks. Embassy's Output::new configures the pin mode, speed, and initial level in one call. The Timer::after_millis(500).await yields control back to the executor while waiting — your CPU can sleep instead of spinning.

Reading a Button

Reading a button is just as straightforward. Most buttons connect the pin to GND when pressed, so we use an internal pull-up resistor to keep the pin HIGH when the button is released.

use embassy_stm32::gpio::{Input, Pull};

#[embassy_executor::main]
async fn main(_spawner: Spawner) {
    let p = embassy_stm32::init(Default::default());

    let button = Input::new(p.PC13, Pull::Up);

    loop {
        if button.is_low() {
            // Button is pressed (pulled to GND)
            defmt::info!("Button pressed!");
        }
        Timer::after_millis(10).await;
    }
}

This works, but it has a problem — we're polling. We check the button every 10 milliseconds whether anything happened or not. There's a better way.

Async Button with ExtiInput

Embassy can use the STM32's EXTI (External Interrupt) hardware to wake up only when a pin actually changes state.

use embassy_stm32::gpio::{Input, Pull};
use embassy_stm32::exti::ExtiInput;

#[embassy_executor::main]
async fn main(_spawner: Spawner) {
    let p = embassy_stm32::init(Default::default());

    let mut button = ExtiInput::new(p.PC13, p.EXTI13, Pull::Up);

    loop {
        // CPU sleeps here until the pin goes LOW
        button.wait_for_falling_edge().await;
        defmt::info!("Button pressed!");

        // Simple debounce: ignore further transitions for 50ms
        Timer::after_millis(50).await;

        // Wait for button release before accepting next press
        button.wait_for_rising_edge().await;
        Timer::after_millis(50).await;
    }
}

The wait_for_falling_edge().await call puts the task to sleep until the hardware detects the pin transitioning from HIGH to LOW. Zero CPU usage while waiting.

Button Debouncing

Mechanical buttons don't produce clean electrical transitions. When you press a button, the metal contacts physically bounce, creating 5 to 10 rapid transitions over a few milliseconds before settling.

What you expect:     HIGH ────────┐
                                  └──────── LOW

What actually happens: HIGH ──┐ ┌┐ ┌┐
                              └─┘└─┘└────── LOW
                              ← ~5ms →

Without debouncing, one press might register as multiple presses. The fix is simple: after detecting a transition, wait about 50 milliseconds before checking again. That's what the Timer::after_millis(50).await does in our code above.

💡 Fun Fact: Debouncing was a problem even in the 1960s. Early computer keyboards had hardware debounce circuits — tiny capacitor-resistor networks on every single key. Modern keyboards do it in firmware, just like we do.

Common LED Pins by Board

Different development boards wire their onboard LEDs to different pins. Here's a quick reference so you don't have to hunt through schematics every time.

BoardLED Pin(s)LED Active StateNotes
WeAct H743PE3LOW (active-low)Single blue LED
STM32F407 DiscoveryPD12, PD13, PD14, PD15HIGHGreen, Orange, Red, Blue
Black Pill (STM32F411)PC13LOW (active-low)Single blue LED
Blue Pill (STM32F103)PC13LOW (active-low)Single green LED
Nucleo-64 boardsPA5HIGHSingle green LED

🧠 Think About It: Notice that "active-low" means you set the pin LOW to turn the LED on. This is because the LED is wired between VCC and the pin — current flows (LED lights) when the pin pulls LOW. It's counterintuitive at first, but it's a common pattern in hardware design because GPIO pins can often sink more current than they can source.

What's Next?

You now know how to control the physical world — one pin at a time. But those pins need to switch at the right speed, and your UART needs the right baud rate, and your SPI needs the right clock. All of that depends on the clock system, which we'll dive into next.

The Clock System

Every single operation inside your microcontroller happens on the tick of a clock. Reading a GPIO pin, sending a byte over UART, converting an analog voltage — all of it is synchronized to clock edges. If you get the clocks wrong, nothing works right, and the symptoms are baffling.

Why Clocks Matter

Imagine you're talking to a friend over the phone, but your phone plays the audio at 1.5x speed. You'd hear garbled nonsense. That's exactly what happens when two devices try to communicate at different clock speeds — the bits arrive at the wrong time, and the receiver interprets garbage.

Here's the harsh reality of what happens when clocks are misconfigured:

SymptomLikely Clock Problem
UART prints garbage charactersBaud rate is wrong because SYSCLK isn't what you think
SPI communication fails randomlyAPB clock doesn't divide cleanly to SPI baud rate
Timers run at wrong speedTimer clock prescaler assumes a different input frequency
USB doesn't enumerateUSB requires exactly 48 MHz — no tolerance
Everything works but is painfully slowRunning on default 64 MHz HSI instead of 480 MHz PLL
Program crashes on startupPLL misconfigured, CPU tries to run at invalid frequency

🧠 Think About It: USB requires a clock accurate to 0.25%. The internal HSI oscillator drifts by about 1%. That's why USB simply will not work without an external crystal or a specially calibrated clock. Precision matters.

Clock Sources

STM32 chips have multiple clock sources, each with different characteristics. You choose based on your accuracy, speed, and power requirements.

HSI — High-Speed Internal

The HSI is an on-chip RC oscillator, typically 64 MHz on H7 series (8 or 16 MHz on other families). It's the default clock source — your chip runs on HSI from the moment it powers up, before any of your code executes.

  • Accuracy: ~1% drift over temperature and voltage changes
  • Advantage: Always available, no external components needed
  • Limitation: Too imprecise for USB, unreliable for high baud rate UART over long periods

HSE — High-Speed External

The HSE uses an external crystal or oscillator, commonly 8 MHz or 25 MHz. This is your precision clock source.

  • Accuracy: Better than 0.01% (50 ppm or less for a typical crystal)
  • Advantage: Precise enough for USB, reliable UART at any baud rate
  • Limitation: Requires a crystal soldered to the board (most dev boards include one)

LSI — Low-Speed Internal

A low-power internal oscillator running at roughly 32 kHz. It's imprecise but cheap in terms of power.

  • Primary use: Independent Watchdog Timer (IWDG)
  • Accuracy: ~5-10% — quite rough, but fine for "reset if firmware is stuck" timers

LSE — Low-Speed External

An external 32.768 kHz crystal — that oddly specific number divides evenly into 1 second (32768 = 2^15), making it perfect for timekeeping.

  • Primary use: Real-Time Clock (RTC)
  • Accuracy: Excellent, usually within a few seconds per month
  • Limitation: Requires a dedicated crystal (most boards include one)

💡 Fun Fact: The 32.768 kHz frequency isn't arbitrary — it's exactly 2^15 Hz. A simple 15-stage binary counter can divide it down to exactly 1 Hz. This is the same crystal used in wristwatches since the 1970s.

The PLL — Frequency Multiplier

Your crystal runs at 25 MHz, but your CPU wants 480 MHz. How do you get there? With a Phase-Locked Loop (PLL) — a circuit that multiplies a low input frequency up to a high output frequency.

The PLL has three stages:

Input (HSE)    Divide (DIVM)    Multiply (DIVN)    Divide (DIVP)    Output
  25 MHz    →    /5 = 5 MHz   →   ×192 = 960 MHz  →   /2 = 480 MHz  → SYSCLK

The intermediate VCO (Voltage-Controlled Oscillator) frequency must stay within a valid range (usually 192–960 MHz for H7). You pick the dividers to land within that range while hitting your target output.

🧠 Think About It: Why not just use a 480 MHz crystal directly? Because high-frequency crystals are expensive, fragile, and consume more power. It's far more practical to use a cheap low-frequency crystal and multiply it up electronically.

The Clock Tree

The PLL output doesn't go directly to every peripheral. Instead, it feeds through a tree of dividers (prescalers) that provide appropriate frequencies to different bus domains.

                    ┌──────────────────────────────────────────┐
                    │              Clock Tree (H743)            │
                    └──────────────────────────────────────────┘

  HSI (64 MHz) ──┐
                  ├──→ [PLL1] ──→ SYSCLK (up to 480 MHz)
  HSE (25 MHz) ──┘                    │
                                      ├──→ AHB Bus (up to 240 MHz)
                                      │         │
                                      │         ├──→ APB1 (up to 120 MHz)
                                      │         │       └─ UART, I2C, TIM2-7
                                      │         │
                                      │         ├──→ APB2 (up to 120 MHz)
                                      │         │       └─ SPI1, TIM1/8, USART1
                                      │         │
                                      │         └──→ APB3, APB4 ...
                                      │
  LSI (32 kHz) ──────────────────────→ IWDG
  LSE (32.768 kHz) ──────────────────→ RTC

Each APB bus has a divider. When you configure a peripheral's baud rate or frequency, the HAL calculates it relative to that peripheral's bus clock, not SYSCLK. This is why getting the clock tree right is so important — everything downstream depends on it.

Embassy Clock Configuration

Using Defaults

The simplest configuration uses the internal HSI oscillator with default settings. This is what you get with one line:

#![allow(unused)]
fn main() {
let p = embassy_stm32::init(Default::default());
}

This works for LED blinking and basic GPIO, but you're leaving performance on the table and can't use USB.

Full Speed Configuration (H743 with 25 MHz Crystal)

For real projects, you'll want to configure the clocks explicitly. Here's how to get the STM32H743 running at its full 480 MHz using a 25 MHz external crystal:

#![allow(unused)]
fn main() {
use embassy_stm32::Config;
use embassy_stm32::rcc::*;

let mut config = Config::default();

// Use the 25 MHz external crystal
config.rcc.hse = Some(Hse {
    freq: embassy_stm32::time::Hertz(25_000_000),
    mode: HseMode::Oscillator,
});

// Configure PLL1: 25 MHz / 5 × 192 / 2 = 480 MHz
config.rcc.pll1 = Some(Pll {
    source: PllSource::HSE,
    prediv: PllPreDiv::DIV5,       // 25 MHz / 5 = 5 MHz
    mul: PllMul::MUL192,           // 5 MHz × 192 = 960 MHz (VCO)
    divp: Some(PllDiv::DIV2),      // 960 / 2 = 480 MHz → SYSCLK
    divq: Some(PllDiv::DIV4),      // 960 / 4 = 240 MHz → can feed peripherals
    divr: None,
});

config.rcc.sys = Sysclk::PLL1_P;           // Use PLL1 P output as system clock
config.rcc.ahb_pre = AHBPrescaler::DIV2;   // AHB = 480 / 2 = 240 MHz
config.rcc.apb1_pre = APBPrescaler::DIV2;  // APB1 = 240 / 2 = 120 MHz
config.rcc.apb2_pre = APBPrescaler::DIV2;  // APB2 = 240 / 2 = 120 MHz
config.rcc.apb3_pre = APBPrescaler::DIV2;  // APB3 = 240 / 2 = 120 MHz
config.rcc.apb4_pre = APBPrescaler::DIV2;  // APB4 = 240 / 2 = 120 MHz

let p = embassy_stm32::init(config);
}

Every line has a purpose. The comments show the math — you should always be able to trace the frequency from input crystal to final bus clock by hand.

💡 Fun Fact: The STM32H7 actually has three PLLs (PLL1, PLL2, PLL3). PLL1 drives the CPU. PLL2 and PLL3 can independently clock peripherals like SAI (audio), ADC, or USB at their own precise frequencies. It's like having three independent frequency synthesizers on one chip.

Clock Configuration Summary

ParameterDefault (HSI)Full Speed (HSE + PLL)
SourceHSI 64 MHzHSE 25 MHz
SYSCLK64 MHz480 MHz
AHB64 MHz240 MHz
APB164 MHz120 MHz
APB264 MHz120 MHz
USB capableNoYes (with PLL divider)
Crystal requiredNoYes

Debugging Clock Problems

If things aren't working after changing clock settings, check these in order:

  1. Does your board actually have an HSE crystal? Check the schematic. If there's no crystal, HSE configuration will hang at startup.
  2. Is the crystal frequency correct? WeAct H743 uses 25 MHz. Discovery boards often use 8 MHz. Using the wrong value silently corrupts every frequency downstream.
  3. Are the PLL dividers within valid ranges? The VCO frequency (after multiply, before final divide) must be within the chip's specified range.
  4. Did you set the flash wait states? Higher clock speeds require more wait states for flash memory access. Embassy handles this automatically, but it's good to know.

What's Next?

Now that your clocks are ticking at the right speed, it's time to learn how to respond to events without wasting CPU cycles. The next chapter covers interrupts and how Embassy turns them into clean, composable async tasks.

Interrupts and Async

So far, our code has followed a straight path: do this, wait, do that, repeat. But embedded systems need to react to the real world, and the real world doesn't wait politely for your code to finish a loop iteration.

The Polling Problem

Let's say you want to detect a button press while blinking an LED. The naive approach is polling — checking the button inside your blink loop:

#![allow(unused)]
fn main() {
loop {
    led.toggle();
    Timer::after_millis(500).await;

    // Check button between blinks
    if button.is_low() {
        defmt::info!("Pressed!");
    }
}
}

This has two serious problems. First, if the button is pressed and released during the 500ms wait, you miss it entirely. Second, you're checking only twice per second — that's a 500ms worst-case response time. For a button that's merely annoying. For a safety-critical sensor signal, it's catastrophic.

🧠 Think About It: Imagine a motor controller that polls a limit switch at 2 Hz. The motor could travel a long distance in 500ms. In industrial systems, missed events don't just cause bugs — they cause damage.

How Interrupts Work

The hardware solution is interrupts. When an event occurs (pin changes state, timer expires, byte received), the hardware immediately pauses whatever the CPU is doing, saves its state, and jumps to a special function called an interrupt handler (or ISR — Interrupt Service Routine). When the handler finishes, the CPU resumes exactly where it left off.

Normal code:    ──── running ─────┐         ┌──── continues ────
                                  │         │
Interrupt:                        └── ISR ──┘
                                  ↑
                          Hardware event fires

STM32 chips use the NVIC (Nested Vectored Interrupt Controller), which supports:

  • Priority levels 0–15 (0 is highest priority, 15 is lowest)
  • Nesting — a higher-priority interrupt can preempt a lower-priority one
  • Dozens of interrupt sources — each peripheral can trigger its own

The Traditional Pain

In C (and bare-metal Rust), working with interrupts is notoriously tricky:

  1. You write an ISR function that must be short and fast
  2. You communicate with main code through volatile global variables
  3. You need critical sections to prevent data races
  4. You manage priorities carefully to avoid deadlocks
  5. Debugging is miserable because the flow is non-linear

This is where Embassy changes the game entirely.

Embassy's Async Model

Embassy maps hardware interrupts to Rust's async/await system. Instead of writing ISR callbacks and juggling shared volatile state, you write tasks that look like normal sequential code. When a task calls .await, it yields the CPU and sleeps until the hardware event occurs.

Under the hood, Embassy still uses interrupts — but it wraps them so you never touch the raw interrupt machinery yourself.

Traditional:     main() ←──shared volatile──→ ISR()
                 (manual synchronization, unsafe, error-prone)

Embassy:         async task1().await    async task2().await
                 (compiler-checked, safe, composable)

Your First Async Task

Let's write a button handler as a standalone Embassy task:

use embassy_executor::Spawner;
use embassy_stm32::exti::ExtiInput;
use embassy_stm32::gpio::{Pull};
use embassy_time::Timer;

#[embassy_executor::task]
async fn button_task(mut button: ExtiInput<'static>) {
    loop {
        button.wait_for_falling_edge().await;
        defmt::info!("Button pressed!");
        Timer::after_millis(50).await; // debounce
    }
}

#[embassy_executor::main]
async fn main(spawner: Spawner) {
    let p = embassy_stm32::init(Default::default());
    let button = ExtiInput::new(p.PC13, p.EXTI13, Pull::Up);
    spawner.spawn(button_task(button)).unwrap();
}

When button_task hits wait_for_falling_edge().await, it sleeps -- consuming zero CPU -- until the EXTI hardware interrupt fires on pin state change.

Running Multiple Concurrent Tasks

The real power shows up when you have several things happening at once. Embassy's executor runs tasks cooperatively — each task runs until it hits an .await, then the executor checks if any other task is ready to run.

#[embassy_executor::task]
async fn blink_task(mut led: Output<'static>) {
    loop {
        led.toggle();
        Timer::after_millis(500).await;
    }
}

#[embassy_executor::task]
async fn button_task(mut button: ExtiInput<'static>) {
    let mut count: u32 = 0;
    loop {
        button.wait_for_falling_edge().await;
        count += 1;
        defmt::info!("Button pressed {} times", count);
        Timer::after_millis(50).await;
    }
}

#[embassy_executor::task]
async fn heartbeat_task() {
    loop {
        defmt::info!("System alive");
        Timer::after_secs(5).await;
    }
}

#[embassy_executor::main]
async fn main(spawner: Spawner) {
    let p = embassy_stm32::init(Default::default());

    let led = Output::new(p.PE3, Level::High, Speed::Low);
    let button = ExtiInput::new(p.PC13, p.EXTI13, Pull::Up);

    spawner.spawn(blink_task(led)).unwrap();
    spawner.spawn(button_task(button)).unwrap();
    spawner.spawn(heartbeat_task()).unwrap();

    // All tasks are running. Main has nothing else to do.
    // The executor will put the CPU to sleep when all tasks are awaiting.
}

Three tasks run concurrently on a single-core MCU with no RTOS, no threads, and no heap allocation. When all tasks are awaiting, the executor puts the CPU into low-power sleep. The next hardware event wakes it.

💡 Fun Fact: Embassy compiles async tasks into state machines at compile time. No heap allocation, no runtime task control blocks. The resulting code is often smaller and faster than hand-written interrupt-based C.

Sharing Data Between Tasks

Tasks that run independently are nice, but eventually they need to communicate. Embassy provides several primitives for this, all designed for embedded use (no heap, no std).

Channel — Message Queue

A Channel is a fixed-size queue. One task sends values, another receives them. If the channel is full, the sender waits. If empty, the receiver waits.

#![allow(unused)]
fn main() {
use embassy_sync::channel::Channel;
use embassy_sync::blocking_mutex::raw::CriticalSectionRawMutex;

// Channel that holds up to 4 u32 values
static EVENT_CHANNEL: Channel<CriticalSectionRawMutex, u32, 4> = Channel::new();

#[embassy_executor::task]
async fn producer_task(mut button: ExtiInput<'static>) {
    let mut count: u32 = 0;
    loop {
        button.wait_for_falling_edge().await;
        count += 1;
        EVENT_CHANNEL.send(count).await;
        Timer::after_millis(50).await;
    }
}

#[embassy_executor::task]
async fn consumer_task(mut led: Output<'static>) {
    loop {
        let count = EVENT_CHANNEL.receive().await;
        defmt::info!("Event #{}", count);
        // Flash LED to acknowledge
        led.set_low();
        Timer::after_millis(100).await;
        led.set_high();
    }
}
}

Signal — Latest Value Notification

A Signal holds a single value and notifies the waiting task. Unlike Channel, if you signal multiple times before the receiver wakes up, only the latest value is kept. Perfect for sensor readings where you always want the most recent data.

#![allow(unused)]
fn main() {
use embassy_sync::signal::Signal;
use embassy_sync::blocking_mutex::raw::CriticalSectionRawMutex;

static TEMPERATURE: Signal<CriticalSectionRawMutex, i32> = Signal::new();

#[embassy_executor::task]
async fn sensor_task() {
    loop {
        let temp = read_temperature();
        TEMPERATURE.signal(temp);
        Timer::after_secs(1).await;
    }
}

#[embassy_executor::task]
async fn display_task() {
    loop {
        let temp = TEMPERATURE.wait().await;
        defmt::info!("Temperature: {} C", temp);
    }
}
}

Mutex — Shared Mutable Access

When multiple tasks need to read and write the same data structure, wrap it in a Mutex. The async mutex yields instead of spinning, so other tasks can run while one holds the lock.

#![allow(unused)]
fn main() {
use embassy_sync::mutex::Mutex;
use embassy_sync::blocking_mutex::raw::CriticalSectionRawMutex;

static STATE: Mutex<CriticalSectionRawMutex, u32> = Mutex::new(0);

#[embassy_executor::task]
async fn button_handler(mut button: ExtiInput<'static>) {
    loop {
        button.wait_for_falling_edge().await;
        {
            let mut count = STATE.lock().await;
            *count += 1;
        } // lock released here
        Timer::after_millis(50).await;
    }
}
}

🧠 Think About It: We use CriticalSectionRawMutex everywhere. On a single-core MCU, a critical section (briefly disabling interrupts) is the simplest correct synchronization. On multi-core chips, you'd use ThreadModeRawMutex instead.

Choosing the Right Primitive

PrimitiveBest ForBehavior
ChannelEvent streams, command queuesBuffered FIFO, backpressure when full
SignalLatest-value notificationsSingle value, newer overwrites older
MutexShared state accessed by multiple tasksExclusive access, async-aware locking

What's Next?

You can now run concurrent tasks that communicate safely. The next chapter covers timers and PWM — hardware counters that let you generate precise waveforms, control servos, and run real-time control loops at exact frequencies.

Timers and PWM

Timers are the workhorses of embedded systems. At their core, they're just hardware counters that tick at a known rate. But from that simple mechanism comes an astonishing range of capabilities: generating precise waveforms, measuring pulse widths, driving motors, controlling servos, and running real-time control loops.

What Is a Hardware Timer?

A timer is a counter register inside the MCU that increments (or decrements) on every tick of its input clock. When it reaches a target value, it resets to zero and can trigger an action — fire an interrupt, toggle a pin, start a DMA transfer, or simply keep counting.

Clock ticks:   ↑  ↑  ↑  ↑  ↑  ↑  ↑  ↑  ↑  ↑  ↑  ↑  ↑
Counter:       0  1  2  3  4  5  0  1  2  3  4  5  0  1 ...
                              ↑                 ↑
                           Reset!            Reset!
                     (ARR = 5, resets at 5)

Two key registers control a timer's behavior:

  • PSC (Prescaler): Divides the input clock. A prescaler of 9 means the counter ticks once every 10 clock cycles.
  • ARR (Auto-Reload Register): The target value. When the counter reaches ARR, it resets and the cycle repeats.

Timer Math

The output frequency of a timer is determined by this formula:

Frequency = Timer_Clock / ((PSC + 1) × (ARR + 1))

The "+1" on both terms exists because the registers are zero-indexed — a PSC value of 0 means divide-by-1, not divide-by-0.

Example: Getting 1 kHz from a 240 MHz Clock

Let's say your timer's input clock is 240 MHz (typical for TIM1 on an H743 running at full speed) and you want a 1 kHz interrupt:

1,000 = 240,000,000 / ((PSC + 1) × (ARR + 1))
(PSC + 1) × (ARR + 1) = 240,000

One solution: PSC = 239, ARR = 999
Check: 240,000,000 / (240 × 1000) = 1,000 Hz ✓

There are many valid combinations. A good rule of thumb: use the prescaler to bring the frequency down to a manageable range, then use ARR for fine-tuning.

💡 Fun Fact: STM32 advanced timers (TIM1, TIM8) have 16-bit prescalers and 16-bit counters, giving you a maximum division factor of 65536 x 65536 = over 4 billion. That means you can generate frequencies from hundreds of MHz all the way down to fractions of a Hertz from a single timer.

PWM — Pulse Width Modulation

PWM is one of the most useful things you can do with a timer. Instead of just resetting at the target count, the timer also toggles an output pin at a specific count within each cycle. The result is a square wave where you control the duty cycle — the fraction of time the signal is HIGH.

100% duty:  ████████████████████████████████

 75% duty:  ██████████████████████________

 50% duty:  ████████████████________________

 25% duty:  ████████________________________

  0% duty:  ________________________________

Why is this useful? Because many devices respond to average power. An LED at 50% duty cycle appears half as bright. A motor at 75% duty runs at roughly 75% speed. The switching happens so fast (typically thousands of times per second) that the physical device can't follow individual pulses — it just sees the average.

🧠 Think About It: Your laptop screen probably uses PWM for brightness control. At low brightness settings, some screens flicker at frequencies that sensitive people can perceive. Higher-quality displays use higher PWM frequencies or DC dimming to avoid this.

PWM with Embassy

Embassy makes PWM straightforward through its SimplePwm driver. Here's how to generate a PWM signal on a TIM1 channel:

#![no_std]
#![no_main]

use embassy_executor::Spawner;
use embassy_stm32::gpio::OutputType;
use embassy_stm32::time::khz;
use embassy_stm32::timer::simple_pwm::{PwmPin, SimplePwm};
use embassy_time::Timer;
use {defmt_rtt as _, panic_probe as _};

#[embassy_executor::main]
async fn main(_spawner: Spawner) {
    let p = embassy_stm32::init(Default::default());

    let pwm_pin = PwmPin::new_ch1(p.PE9, OutputType::PushPull);
    let mut pwm = SimplePwm::new(
        p.TIM1, Some(pwm_pin), None, None, None,
        khz(1), Default::default(),
    );

    let max_duty = pwm.get_max_duty();
    pwm.enable(embassy_stm32::timer::Channel::Ch1);

    // Breathe effect: ramp brightness up and down
    loop {
        for i in 0..=100 {
            pwm.set_duty(embassy_stm32::timer::Channel::Ch1, max_duty * i / 100);
            Timer::after_millis(10).await;
        }
        for i in (0..=100).rev() {
            pwm.set_duty(embassy_stm32::timer::Channel::Ch1, max_duty * i / 100);
            Timer::after_millis(10).await;
        }
    }
}

The get_max_duty() returns the ARR value. Setting duty to max_duty / 2 gives 50%. Setting it to max_duty * 75 / 100 gives 75%.

Controlling a Servo Motor

Hobby servos expect a very specific PWM signal:

  • Frequency: 50 Hz (20 ms period)
  • Pulse width: 1 ms (0 degrees) to 2 ms (180 degrees)
  • Center position: 1.5 ms (90 degrees)
 0°:    ██__________________    (1ms HIGH, 19ms LOW)
90°:    ███_________________    (1.5ms HIGH, 18.5ms LOW)
180°:   ████________________    (2ms HIGH, 18ms LOW)
        ←────── 20ms ──────→

Here's how to control a servo with Embassy:

#![allow(unused)]
fn main() {
let pwm_pin = PwmPin::new_ch1(p.PE9, OutputType::PushPull);
let mut pwm = SimplePwm::new(
    p.TIM1, Some(pwm_pin), None, None, None,
    hz(50), Default::default(),  // 50 Hz for servo
);
pwm.enable(embassy_stm32::timer::Channel::Ch1);
let max_duty = pwm.get_max_duty();

// Convert angle (0-180) to duty cycle
// 1ms = 5% of 20ms period, 2ms = 10% of 20ms period
let angle_to_duty = |angle: u32| -> u32 {
    let min_duty = max_duty * 5 / 100;   // 1ms pulse (0 degrees)
    let max_pulse = max_duty * 10 / 100;  // 2ms pulse (180 degrees)
    min_duty + (max_pulse - min_duty) * angle / 180
};

// Sweep from 0 to 180 degrees
for angle in (0..=180).step_by(1) {
    pwm.set_duty(embassy_stm32::timer::Channel::Ch1, angle_to_duty(angle));
    Timer::after_millis(15).await;
}
}

💡 Fun Fact: Servo control signals were standardized in the 1960s for radio-controlled aircraft. The 1-2ms pulse width range was chosen because the analog circuits of the era could reliably discriminate pulse widths in that range. We still use the same protocol today, over 60 years later.

Precise Periodic Loops with Ticker

Many embedded applications need to run code at an exact, consistent rate — PID control loops, sensor sampling, communication protocols. Embassy's Ticker provides precisely timed periodic execution:

#![allow(unused)]
fn main() {
use embassy_time::Ticker;
use core::time::Duration;

#[embassy_executor::task]
async fn control_loop_task() {
    // Tick every 1ms = 1kHz control loop
    let mut ticker = Ticker::every(Duration::from_millis(1));

    loop {
        // Read sensor
        let position = read_encoder();

        // Compute PID output
        let output = compute_pid(position, target);

        // Apply to motor
        set_motor_pwm(output);

        // Wait for next tick — compensates for execution time
        ticker.next().await;
    }
}
}

The critical difference between Ticker and Timer::after_millis is drift compensation. Timer::after_millis(1) waits 1ms from the point you call it — so your loop period is 1ms plus whatever time your code took to execute. Ticker::every maintains a fixed period regardless of execution time. If your code takes 0.3ms, the ticker waits 0.7ms. If it occasionally takes 0.8ms, the ticker waits 0.2ms. The frequency stays locked.

🧠 Think About It: In a PID control loop running at 1 kHz, consistent timing directly affects the derivative and integral calculations. If your loop period varies randomly between 0.8ms and 1.5ms, your derivative term will produce noisy, unreliable values. Ticker solves this.

Timer Channels and Applications

Most STM32 timers have multiple output channels, each capable of independent PWM with different duty cycles but sharing the same frequency.

TimerChannelsResolutionTypical Applications
TIM1, TIM84 + complementary16-bitMotor control, complementary PWM, 3-phase inverters
TIM2, TIM5432-bitLong-duration timing, input capture, encoder interface
TIM3, TIM4416-bitGeneral purpose PWM, LED control, buzzer
TIM6, TIM70 (basic)16-bitDAC triggering, periodic interrupts
TIM12-TIM141-216-bitSimple PWM, auxiliary timing

TIM1/TIM8 are "advanced" timers with complementary outputs and dead-time insertion -- essential for H-bridge motor drivers. TIM2/TIM5 have 32-bit counters (up to ~17.9 seconds at 240 MHz). TIM6/TIM7 have no output pins -- they're purely internal, often used to trigger DAC conversions.

Putting It All Together

Here's a practical example combining PWM and async tasks -- an LED that breathes at a speed controlled by a button:

#![allow(unused)]
fn main() {
use embassy_sync::signal::Signal;
use embassy_sync::blocking_mutex::raw::CriticalSectionRawMutex;

static SPEED: Signal<CriticalSectionRawMutex, u64> = Signal::new();

#[embassy_executor::task]
async fn button_task(mut button: ExtiInput<'static>) {
    let speeds = [5, 10, 20, 50]; // ms per step
    let mut index = 0;
    loop {
        button.wait_for_falling_edge().await;
        index = (index + 1) % speeds.len();
        SPEED.signal(speeds[index]);
        Timer::after_millis(50).await;
    }
}

#[embassy_executor::task]
async fn breathe_task(mut pwm: SimplePwm<'static, embassy_stm32::peripherals::TIM1>) {
    let max_duty = pwm.get_max_duty();
    let ch = embassy_stm32::timer::Channel::Ch1;
    pwm.enable(ch);
    let mut step_ms: u64 = 10;

    loop {
        for i in 0..=100 {
            if let Some(s) = SPEED.try_take() { step_ms = s; }
            pwm.set_duty(ch, max_duty * i / 100);
            Timer::after_millis(step_ms).await;
        }
        for i in (0..=100).rev() {
            if let Some(s) = SPEED.try_take() { step_ms = s; }
            pwm.set_duty(ch, max_duty * i / 100);
            Timer::after_millis(step_ms).await;
        }
    }
}
}

This combines PWM, async tasks, and inter-task communication -- all running cooperatively on a single-core MCU with no OS.

What's Next?

Timers and PWM give you precise control over time and waveforms. Next, we'll explore serial communication — UART, SPI, and I2C — the protocols that let your microcontroller talk to sensors, displays, and other devices.

UART — Serial Communication

UART is probably the first communication protocol you will ever use in embedded systems. It is how your microcontroller talks to your computer over a serial terminal. It is how GPS modules spit out coordinates. It is how two boards chat with each other over a pair of wires. And the beautiful thing is — it is dead simple.

How UART Works

UART stands for Universal Asynchronous Receiver/Transmitter. The key word is "asynchronous" — there is no shared clock wire. Both sides just agree ahead of time on how fast they will talk, and then they trust each other to keep time.

The wiring could not be simpler:

WirePurpose
TXTransmit — data goes out
RXReceive — data comes in
GNDGround — shared voltage reference

That is it. Two data wires and a ground. TX on one device connects to RX on the other, and vice versa. The wires are crossed — this trips up everyone at least once, so do not feel bad when it happens to you.

  Device A          Device B
  --------          --------
  TX  ------------>  RX
  RX  <------------  TX
  GND -------------- GND

The UART Frame

When a UART line is idle, it sits HIGH (at 3.3V). To send a byte, the transmitter pulls the line LOW for one bit period — that is the start bit, a wake-up call that says "data incoming." Then it sends the 8 data bits, least significant bit first. Finally, it sends a stop bit (HIGH for one bit period) to mark the end of the frame.

Idle ──┐   ┌──┐  ┌──┐     ┌──┐  ┌───── Idle
       │   │  │  │  │ ... │  │  │
       └───┘  └──┘  └─────┘  └──┘
       Start  D0 D1  ...  D7  Stop

There is no address, no handshake, no error checking built in. It is a fire-and-forget protocol. You transmit a byte and hope the other side was listening. (Spoiler: in practice, it works remarkably well.)

Baud Rates

Since there is no clock wire, both devices must agree on the baud rate — the number of bits transmitted per second. If one side sends at 115200 baud and the other listens at 9600, you get garbage.

Baud RateUse CaseBits/secApprox. Bytes/sec
9600GPS modules, slow sensors9,600~960
115200Debug output, general use115,200~11,520
921600High-speed telemetry, bulk data921,600~92,160

💡 Fun Fact: The baud rate 9600 dates back to the Bell 212A modem from 1976. GPS modules still default to 9600 baud almost 50 years later. Legacy is a powerful force in embedded systems.

The Three Most Common UART Bugs

You will hit at least one of these in your first week. Probably all three.

1. TX and RX are swapped. You connected TX to TX and RX to RX. Remember: they cross. TX on your board goes to RX on the other device.

2. Baud rate mismatch. You see garbage characters in your serial terminal. The data is arriving, but the receiver is interpreting the bit timing incorrectly. Double-check both sides are set to the same baud rate.

3. Missing GND connection. You connected TX and RX but forgot the ground wire. Without a shared voltage reference, the receiver has no idea what "HIGH" and "LOW" mean. Always connect GND.

🧠 Think About It: If you see the character U (0x55, binary 01010101) transmitted correctly but other characters are garbled, what might that tell you about the baud rate? Hint: U produces a perfectly alternating bit pattern that looks "correct" even at wrong baud rates.

UART in Embassy

Embassy makes UART wonderfully straightforward. You create a UART peripheral, give it the pins and a DMA channel, and start reading and writing.

Basic Setup and Transmit

use embassy_stm32::usart::{Config, Uart};
use embassy_stm32::bind_interrupts;

bind_interrupts!(struct Irqs {
    USART2 => embassy_stm32::usart::InterruptHandler<embassy_stm32::peripherals::USART2>;
});

#[embassy_executor::main]
async fn main(_spawner: Spawner) {
    let p = embassy_stm32::init(Default::default());

    let mut config = Config::default();
    config.baudrate = 115_200;

    let mut uart = Uart::new(
        p.USART2,      // peripheral
        p.PA3,          // RX pin
        p.PA2,          // TX pin
        Irqs,
        p.DMA1_CH6,    // TX DMA
        p.DMA1_CH5,    // RX DMA
        config,
    ).unwrap();

    // Send a message
    uart.write(b"Hello from STM32!\r\n").await.unwrap();
}

Reading Data

For receiving, UART has an interesting challenge — you often do not know how many bytes are coming. Embassy provides read_until_idle, which reads bytes until the line goes quiet for a moment. This is perfect for protocols like NMEA (GPS) where messages arrive as complete lines.

#![allow(unused)]
fn main() {
let mut buf = [0u8; 256];

loop {
    // Read until the line goes idle (no more data arriving)
    match uart.read_until_idle(&mut buf).await {
        Ok(n) => {
            // buf[..n] contains the received bytes
            defmt::info!("Received {} bytes: {:?}", n, &buf[..n]);
        }
        Err(e) => {
            defmt::error!("UART read error: {:?}", e);
        }
    }
}
}

Splitting TX and RX

Often you want one task transmitting telemetry while another task processes incoming data. Embassy lets you split a UART into separate TX and RX halves:

#![allow(unused)]
fn main() {
let (mut tx, mut rx) = uart.split();

// Now tx and rx can be moved into separate tasks
spawner.spawn(reader_task(rx)).unwrap();
spawner.spawn(writer_task(tx)).unwrap();
}

Practical: Reading GPS NMEA Sentences

GPS modules are the classic UART peripheral. They continuously spit out NMEA sentences — ASCII text lines that start with $ and end with \r\n. Here is what they look like:

$GPGGA,123519,4807.038,N,01131.000,E,1,08,0.9,545.4,M,47.0,M,,*47
$GPRMC,123519,A,4807.038,N,01131.000,E,022.4,084.4,230394,003.1,W*6A
#![allow(unused)]
fn main() {
#[embassy_executor::task]
async fn gps_task(mut rx: UartRx<'static, embassy_stm32::mode::Async>) {
    let mut buf = [0u8; 256];

    loop {
        match rx.read_until_idle(&mut buf).await {
            Ok(n) => {
                let data = &buf[..n];
                // Check if this is a GGA sentence (contains position fix)
                if data.starts_with(b"$GPGGA") || data.starts_with(b"$GNGGA") {
                    defmt::info!("Position fix: {=[u8]:a}", data);
                }
            }
            Err(_) => {
                defmt::warn!("GPS read error, continuing...");
            }
        }
    }
}
}

💡 Fun Fact: NMEA stands for National Marine Electronics Association. The protocol was originally designed for ships, which is why even your tiny drone GPS module speaks in sentences that start with $GP (GPS), $GL (GLONASS), or $GN (multi-constellation).

Sending Telemetry

UART is also great for sending data out. Here is a pattern for periodic telemetry:

#![allow(unused)]
fn main() {
use core::fmt::Write;
use heapless::String;

#[embassy_executor::task]
async fn telemetry_task(mut tx: UartTx<'static, embassy_stm32::mode::Async>) {
    let mut ticker = Ticker::every(Duration::from_hz(10)); // 10 Hz

    loop {
        ticker.next().await;

        let voltage = read_battery_voltage().await;
        let temp = read_temperature().await;

        let mut msg: String<128> = String::new();
        write!(msg, "V:{:.2},T:{:.1}\r\n", voltage, temp).unwrap();

        tx.write(msg.as_bytes()).await.unwrap();
    }
}
}

UART vs USART

You will notice the STM32 peripheral is called USART, not UART. The "S" stands for Synchronous. USART can operate in two modes:

FeatureUART ModeUSART Synchronous Mode
Clock wireNoYes (CK pin)
Baud agreementBoth sides configureMaster provides clock
Use caseGeneral purposeRare — mostly SPI-like comms

In practice, you will almost always use USART in asynchronous (UART) mode. The synchronous mode exists but is rarely needed — if you want a clocked protocol, SPI (next chapter) is a better choice.

Summary

UART is the "hello world" of embedded communication. Two wires, no clock, simple framing. It is the protocol you will use for debugging, for GPS, for talking to Bluetooth modules, and for a hundred other things. Its simplicity is its superpower — when something is not working, there are only three wires to check.

In the next chapter, we will look at SPI — a faster, clocked protocol that trades simplicity for speed.

SPI — High-Speed Serial Bus

If UART is a casual phone conversation — both sides talking whenever they feel like it — then SPI is a military radio channel. One device is the master, it controls the clock, and everyone else listens. The result is a protocol that is fast, reliable, and beautifully deterministic.

How SPI Works

SPI stands for Serial Peripheral Interface. It uses four wires:

WireFull NameDirectionPurpose
SCKSerial ClockMaster -> SlaveClock signal generated by master
MOSIMaster Out, Slave InMaster -> SlaveData from master to slave
MISOMaster In, Slave OutSlave -> MasterData from slave to master
CSChip SelectMaster -> SlaveSelects which slave to talk to (active LOW)

The master generates the clock on SCK. On each clock pulse, one bit shifts out on MOSI (from master to slave) and simultaneously one bit shifts in on MISO (from slave to master). This is full duplex — data flows both directions at the same time.

  Master                Slave
  ------                -----
  SCK   ──────────────>  SCK
  MOSI  ──────────────>  MOSI (data in)
  MISO  <──────────────  MISO (data out)
  CS    ──────────────>  CS   (active LOW)

When CS is HIGH, the slave ignores everything on the bus. When the master pulls CS LOW, the slave wakes up and participates in the transfer. This is how you put multiple slaves on the same bus — they all share SCK, MOSI, and MISO, but each gets its own CS line.

  Master
  ------
  SCK   ──────┬──────── Slave A (SCK)
  MOSI  ──────┬──────── Slave A (MOSI)
  MISO  ──────┬──────── Slave A (MISO)
  CS_A  ──────────────── Slave A (CS)
              │
              ├──────── Slave B (SCK)
              ├──────── Slave B (MOSI)
              ├──────── Slave B (MISO)
  CS_B  ──────────────── Slave B (CS)

💡 Fun Fact: SPI has no formal specification document. Motorola (now NXP) invented it in the 1980s, but never published an official standard. Every vendor implements it slightly differently, which is why SPI datasheets require careful reading.

SPI Speed

SPI can run incredibly fast. While UART tops out around 1 Mbps in practice, SPI routinely runs at:

SpeedTypical Use Case
1 MHzConservative, works with anything
10 MHzMost sensors and peripherals
20-50 MHzFlash memory, fast displays
50+ MHzHigh-speed ADCs, FPGAs

The STM32H743 can push SPI up to 150 MHz on its SPI peripheral. In practice, you are usually limited by the slave device and your PCB trace quality, but even budget sensors happily run at 10 MHz.

SPI Modes

Here is where SPI gets a bit fiddly. The clock signal has two configurable properties:

  • CPOL (Clock Polarity): Is the clock idle LOW (0) or idle HIGH (1)?
  • CPHA (Clock Phase): Is data sampled on the first clock edge (0) or the second (1)?

This gives four combinations, called SPI modes:

ModeCPOLCPHAClock IdleData Sampled On
Mode 000LOWRising edge
Mode 101LOWFalling edge
Mode 210HIGHFalling edge
Mode 311HIGHRising edge

The vast majority of sensors and peripherals use Mode 0 or Mode 3. Mode 0 is the most common default. Always check the datasheet of the device you are talking to.

🧠 Think About It: Mode 0 and Mode 3 both sample data on the rising edge. The only difference is the idle state of the clock. Why might a device prefer one over the other? Think about what happens on the very first clock edge after CS goes low.

SPI in Embassy

Basic Setup

use embassy_stm32::spi::{Config, Spi};
use embassy_stm32::gpio::{Level, Output, Speed};
use embassy_stm32::time::Hertz;

#[embassy_executor::main]
async fn main(_spawner: Spawner) {
    let p = embassy_stm32::init(Default::default());

    let mut spi_config = Config::default();
    spi_config.frequency = Hertz(1_000_000); // 1 MHz to start

    let mut spi = Spi::new(
        p.SPI1,
        p.PA5,          // SCK
        p.PA7,          // MOSI
        p.PA6,          // MISO
        p.DMA2_CH3,     // TX DMA
        p.DMA2_CH2,     // RX DMA
        spi_config,
    );

    // CS is a regular GPIO — you control it manually
    let mut cs = Output::new(p.PA4, Level::High, Speed::VeryHigh);

    // Perform a transfer
    let tx_data = [0x01, 0x02, 0x03];
    let mut rx_data = [0u8; 3];

    cs.set_low();                             // Select slave
    spi.transfer(&mut rx_data, &tx_data).await.unwrap();
    cs.set_high();                            // Deselect slave

    defmt::info!("Received: {:?}", rx_data);
}

Notice that CS is just a regular GPIO pin. Embassy does not manage it for you — you pull it LOW before a transfer and HIGH after. This gives you full control over multi-byte transactions where CS must stay low the entire time.

Write-Only and Read-Only

Sometimes you just want to send data (like to a display) or just read data:

#![allow(unused)]
fn main() {
// Write only — ignores incoming data
cs.set_low();
spi.write(&[0xAA, 0xBB, 0xCC]).await.unwrap();
cs.set_high();

// Read only — sends zeros while reading
let mut buf = [0u8; 4];
cs.set_low();
spi.read(&mut buf).await.unwrap();
cs.set_high();
}

Practical: Reading an IMU Over SPI

Most SPI sensors use a register-based protocol. To read a register, you send the register address with the read bit set (usually bit 7 = 1), then clock in the response.

Here is an example reading the WHO_AM_I register from an ICM-42688 IMU:

#![allow(unused)]
fn main() {
const WHO_AM_I_REG: u8 = 0x75;
const READ_FLAG: u8 = 0x80;         // Bit 7 set = read operation

async fn read_register(
    spi: &mut Spi<'_, embassy_stm32::mode::Async>,
    cs: &mut Output<'_>,
    reg: u8,
) -> u8 {
    let tx = [reg | READ_FLAG, 0x00]; // Send address, then dummy byte
    let mut rx = [0u8; 2];

    cs.set_low();
    spi.transfer(&mut rx, &tx).await.unwrap();
    cs.set_high();

    rx[1] // First byte is garbage (received while address was sent)
}

// Usage:
let who_am_i = read_register(&mut spi, &mut cs, WHO_AM_I_REG).await;
defmt::info!("WHO_AM_I = 0x{:02x}", who_am_i); // Should be 0x47
}

💡 Fun Fact: The WHO_AM_I register is a tradition in sensor design. It returns a fixed, known value so you can verify you are talking to the right chip. If you read it and get 0x00 or 0xFF, something is wrong with your wiring.

Debugging SPI

SPI bugs have distinctive signatures. Learn to recognize them:

SymptomLikely CauseFix
All zeros (0x00)CS stuck HIGH — slave is not selectedCheck CS pin, make sure it goes LOW
All ones (0xFF)Slave not powered, or MISO floatingCheck power supply, verify wiring
Garbage dataClock too fast for the slaveReduce SPI frequency
Correct first byte, then garbageCS bouncing between bytesKeep CS LOW during entire transaction
Data shifted by one bitWrong SPI mode (CPOL/CPHA)Check datasheet, try Mode 0 and Mode 3

🧠 Think About It: Why does a slave that is not selected return all zeros, while a slave that is not powered returns all ones? Think about what happens to the MISO line in each case. (Hint: active drive vs pull-up resistors.)

SPI vs UART

FeatureUARTSPI
Wires2 (TX, RX)4+ (SCK, MOSI, MISO, CS)
SpeedUp to ~1 MbpsUp to 150 MHz
DuplexFullFull
Multiple devicesNo (point to point)Yes (shared bus + individual CS)
ClockNo (async)Yes (master provides)
Best forDebug, GPS, BluetoothIMUs, flash, displays, fast sensors

Summary

SPI is the go-to protocol when you need speed and reliability. It trades simplicity (more wires) for performance (tens of MHz, full duplex, deterministic timing). You will use it for IMUs, barometers, flash memory, SD cards, and displays.

The key things to remember: the master controls the clock, CS is active LOW and you manage it yourself, and always check the datasheet for the correct SPI mode. Get those right, and SPI just works.

Next up: I2C — a two-wire bus that trades speed for simplicity and lets you connect dozens of sensors with just two wires.

I2C — Two-Wire Sensor Bus

If SPI is a four-lane highway, I2C is a two-lane country road — slower, narrower, but it goes everywhere. You can connect a temperature sensor, a pressure sensor, an OLED display, and an EEPROM all on the same two wires. No extra chip-select lines. No extra pins. Just two wires and some clever addressing.

How I2C Works

I2C (pronounced "I-squared-C" or "I-two-C") stands for Inter-Integrated Circuit. It was invented by Philips (now NXP) in 1982 and has been the go-to protocol for low-speed sensors ever since.

The bus uses just two wires:

WirePurpose
SCLSerial Clock — driven by the master
SDASerial Data — bidirectional, shared by everyone

Plus GND of course. Both lines need pull-up resistors (typically 4.7 kohm to 3.3V). This is not optional — without pull-ups, nothing works.

        3.3V        3.3V
         │           │
        [4.7k]      [4.7k]
         │           │
  SCL ───┼───────────┼──── Slave A ──── Slave B ──── Slave C
  SDA ───┼───────────┼──── Slave A ──── Slave B ──── Slave C
  GND ───┴───────────┴──── Slave A ──── Slave B ──── Slave C

Why Pull-Up Resistors?

I2C uses an open-drain bus. Devices can only pull the line LOW — they cannot drive it HIGH. The pull-up resistors pull the lines back to HIGH when nobody is pulling them down. This is what allows multiple devices to share the same wires without electrical conflict.

🧠 Think About It: Why did the designers choose open-drain instead of push-pull? Think about what happens if two devices try to drive the same wire to different levels at the same time.

Addresses

Every device on an I2C bus has a 7-bit address (0x00 to 0x7F, though some are reserved). When the master wants to talk to a specific device, it sends that device's address first. Only the addressed device responds. Everyone else stays quiet.

This means you can have up to 128 devices on a single bus — though in practice, you rarely use more than a dozen.

Speed

ModeSpeedUse Case
Standard100 kHzDefault, works with everything
Fast400 kHzMost modern sensors
Fast Plus1 MHzSome newer devices
High Speed3.4 MHzRarely used with MCUs

Most of the time, you will use 400 kHz (Fast mode). It works with nearly all modern sensors and gives a nice balance of speed and reliability.

Start, Stop, ACK, and NACK

An I2C transaction looks like this:

  1. Start condition: Master pulls SDA LOW while SCL is HIGH — this wakes everyone up
  2. Address byte: Master sends the 7-bit slave address plus a read/write bit
  3. ACK: The addressed slave pulls SDA LOW for one clock cycle — "I'm here!"
  4. Data bytes: One or more bytes are transferred, each followed by an ACK
  5. NACK or Stop: When the transfer is done, the master sends a NACK (no acknowledge) or a stop condition

If nobody ACKs the address byte, the master knows something is wrong — either the address is incorrect or the device is not on the bus.

💡 Fun Fact: The I2C patent expired in 2006, which is why you now find I2C on absolutely everything. Before that, manufacturers had to pay NXP a licensing fee, which is why some early microcontrollers used a compatible but differently-named protocol called "TWI" (Two-Wire Interface).

I2C in Embassy

Basic Setup

use embassy_stm32::i2c::{Config, I2c};
use embassy_stm32::time::Hertz;
use embassy_stm32::bind_interrupts;

bind_interrupts!(struct Irqs {
    I2C1_EV => embassy_stm32::i2c::EventInterruptHandler<embassy_stm32::peripherals::I2C1>;
    I2C1_ER => embassy_stm32::i2c::ErrorInterruptHandler<embassy_stm32::peripherals::I2C1>;
});

#[embassy_executor::main]
async fn main(_spawner: Spawner) {
    let p = embassy_stm32::init(Default::default());

    let i2c = I2c::new(
        p.I2C1,
        p.PB6,          // SCL
        p.PB7,          // SDA
        Irqs,
        p.DMA1_CH6,     // TX DMA
        p.DMA1_CH0,     // RX DMA
        Hertz(400_000),  // 400 kHz — Fast mode
        Config::default(),
    );
}

Reading a Sensor Register

The most common I2C pattern is write-then-read: you write the register address you want to read, then read back the data. Embassy provides write_read for exactly this:

#![allow(unused)]
fn main() {
const BMP280_ADDR: u8 = 0x76;
const CHIP_ID_REG: u8 = 0xD0;

let mut chip_id = [0u8; 1];
i2c.write_read(BMP280_ADDR, &[CHIP_ID_REG], &mut chip_id).await.unwrap();

defmt::info!("BMP280 chip ID: 0x{:02x}", chip_id[0]); // Should be 0x58
}

Writing to a Register

To write a register, you send the register address followed by the data — all in one write call:

#![allow(unused)]
fn main() {
const CTRL_MEAS_REG: u8 = 0xF4;
const NORMAL_MODE: u8 = 0b0010_0111; // temp x1, press x1, normal mode

i2c.write(BMP280_ADDR, &[CTRL_MEAS_REG, NORMAL_MODE]).await.unwrap();
}

Reading Multiple Bytes

Many sensors let you read a burst of consecutive registers in a single transaction:

#![allow(unused)]
fn main() {
const PRESS_MSB_REG: u8 = 0xF7;

let mut raw_data = [0u8; 6]; // 3 bytes pressure + 3 bytes temperature
i2c.write_read(BMP280_ADDR, &[PRESS_MSB_REG], &mut raw_data).await.unwrap();

let raw_pressure = ((raw_data[0] as u32) << 12)
    | ((raw_data[1] as u32) << 4)
    | ((raw_data[2] as u32) >> 4);

let raw_temp = ((raw_data[3] as u32) << 12)
    | ((raw_data[4] as u32) << 4)
    | ((raw_data[5] as u32) >> 4);

defmt::info!("Raw pressure: {}, Raw temp: {}", raw_pressure, raw_temp);
}

Common I2C Devices

You will encounter these addresses over and over:

DeviceDescriptionAddress(es)
BMP280 / BME280Pressure + temperature (+ humidity)0x76 or 0x77 (SDO pin selects)
MPU60506-axis IMU (accel + gyro)0x68 or 0x69 (AD0 pin selects)
AT24Cxx EEPROMNon-volatile storage0x50 - 0x57 (A0-A2 pins select)
SSD1306128x64 OLED display0x3C or 0x3D
AHT20Temperature + humidity0x38
INA219Current/power monitor0x40 - 0x4F (A0-A1 pins select)

Notice how some devices have configurable addresses. The BMP280, for example, has an SDO pin — tie it to GND and the address is 0x76, tie it to VCC and it is 0x77. This lets you put two BMP280s on the same bus.

💡 Fun Fact: If you do not know the address of a device, you can scan the entire bus. Send a write to every address from 0x08 to 0x77. Any address that ACKs has a device on it. This is called an "I2C scan" and it is the first thing to try when a sensor is not responding.

I2C vs SPI — When to Use Which

This is one of the most common questions in embedded development. Here is the honest comparison:

FeatureI2CSPI
Wires2 (SCL, SDA) + GND4+ (SCK, MOSI, MISO, CS) + GND
Speed100 kHz - 1 MHz typical1 MHz - 50+ MHz
DuplexHalf duplexFull duplex
Multiple devicesUp to 128 on same 2 wiresOne CS pin per device
Pull-up resistorsRequired (4.7k typical)Not needed
Pin count for 5 sensors2 pins total7 pins (SCK + MOSI + MISO + 5 CS)
Best forTemperature, pressure, EEPROM, slow sensorsIMUs, flash, displays, fast ADCs
Protocol overheadAddress + ACK per transactionJust CS toggle

Rule of thumb: Use I2C when you have many slow sensors and few pins. Use SPI when you need speed or are talking to high-bandwidth devices.

Debugging I2C

I2C problems have characteristic symptoms:

No ACK (NACK on address byte):

  • Wrong address. Double-check the datasheet — some list the 8-bit address (left-shifted by 1), not the 7-bit address.
  • Missing pull-up resistors. Without them, nothing works. Measure the voltage on SCL and SDA — they should sit at 3.3V when idle.
  • Device not powered. Check VCC.

SDA stuck LOW (bus lockup):

  • A slave got confused mid-transaction and is holding SDA low. Fix: toggle SCL manually 9 times — this clocks out any stuck slave.

Data corruption or intermittent failures:

  • Pull-up resistors too weak (too high a value). Try 2.2k instead of 4.7k, especially with long wires or many devices.
  • Bus capacitance too high. Long wires or too many devices load the bus. Shorten wires or reduce speed.

🧠 Think About It: Why does toggling SCL 9 times fix a stuck bus? Think about the I2C protocol — a slave releases SDA after the 9th clock pulse (the ACK bit) of every byte. Clocking 9 times guarantees you hit that release point regardless of where the slave got stuck.

Summary

I2C is the protocol of choice when you want simplicity and need to connect multiple sensors. Two wires, pull-up resistors, and 7-bit addresses give you a clean, well-defined bus that just works — as long as you remember those pull-ups.

The pattern you will use most often is write_read: send a register address, read back data. Master that, and you can talk to any I2C sensor on the market.

Next up: ADC — turning analog voltages from the real world into numbers your code can work with.

ADC — Reading Analog Voltages

The real world is analog. Temperature is not 25 or 26 degrees — it is 25.37 degrees. A battery does not suddenly go from "full" to "empty" — its voltage droops gradually. A joystick does not just point "left" or "right" — it has a smooth range of positions. To work with these real-world signals, you need an ADC — an Analog-to-Digital Converter.

How an ADC Works

An ADC measures a voltage and converts it to a number. That is really all it does.

On most STM32 chips, the ADC measures voltages between 0V and 3.3V (the reference voltage). It outputs a number between 0 and some maximum value that depends on the resolution:

ResolutionMax ValueVoltage per StepTypical Use
8-bit25512.94 mVQuick and dirty, low-precision
10-bit1,0233.23 mVBasic sensors
12-bit4,0950.81 mVMost STM32 families (F1, F4, G0)
16-bit65,5350.05 mVSTM32H7 (high precision)

The formula to convert a reading to voltage is:

voltage = (reading / max_value) x reference_voltage

For a 12-bit ADC on a 3.3V system:

voltage = (reading / 4095) x 3.3

For the H743 at 16-bit resolution:

voltage = (reading / 65535) x 3.3

💡 Fun Fact: A 16-bit ADC can distinguish voltage differences of about 50 microvolts. That is roughly the voltage generated by the thermoelectric effect of touching two different metals together. At this resolution, even PCB trace layout and component placement start to matter.

ADC in Embassy

Basic Reading

Embassy makes ADC straightforward. Here is the simplest possible example — read one pin, print the value:

use embassy_stm32::adc::{Adc, SampleTime};

#[embassy_executor::main]
async fn main(_spawner: Spawner) {
    let p = embassy_stm32::init(Default::default());

    let mut adc = Adc::new(p.ADC1);

    // Some STM32 families need a brief delay for the ADC to stabilize
    Timer::after_millis(1).await;

    let mut pin = p.PA0; // Analog input pin

    loop {
        let raw = adc.blocking_read(&mut pin);
        let voltage = (raw as f32 / 4095.0) * 3.3;

        defmt::info!("Raw: {}, Voltage: {:.3}V", raw, voltage);
        Timer::after_millis(100).await;
    }
}

Sampling Time

The ADC does not measure instantaneously — it needs time to "sample" the voltage. STM32 ADCs let you configure the sampling time, which is the number of ADC clock cycles spent measuring the input before converting:

Sampling TimeWhen to Use
Short (1.5 - 7.5 cycles)Fast signals, low-impedance sources
Medium (28.5 - 56 cycles)General purpose
Long (239.5 - 640 cycles)High-impedance sources, precision measurements

A longer sampling time gives the ADC's internal capacitor more time to charge to the input voltage, producing more accurate readings — at the cost of speed.

#![allow(unused)]
fn main() {
// Configure sampling time on a per-channel basis
adc.set_sample_time(SampleTime::CYCLES239_5);
}

🧠 Think About It: Why would a high-impedance source (like a voltage divider with large resistors) need a longer sampling time? Think about RC time constants — the ADC's internal sampling capacitor needs to be charged through the source impedance.

Reading the Internal Temperature Sensor

Every STM32 has an internal temperature sensor connected to the ADC. It is not very accurate (plus or minus 3 degrees Celsius typical), but it is free and requires no external components:

#![allow(unused)]
fn main() {
use embassy_stm32::adc::Temperature;

let mut adc = Adc::new(p.ADC1);
let mut temp_channel = Temperature;

// Internal channels typically need longer sampling time
adc.set_sample_time(SampleTime::CYCLES239_5);

let raw = adc.blocking_read(&mut temp_channel);

// The conversion formula varies by STM32 family — check the datasheet
// For STM32F4: temp_celsius = ((raw - V25) / avg_slope) + 25
// V25 and avg_slope are in the datasheet's electrical characteristics
defmt::info!("Internal temp raw: {}", raw);
}

Practical: Battery Monitoring with a Voltage Divider

Here is a real-world problem. You have a 12V LiPo battery powering your robot, and you want to monitor its voltage. But the ADC can only read up to 3.3V. If you connect 12V directly to the ADC pin, you will destroy it.

The solution is a voltage divider — two resistors that scale the voltage down to a safe range.

  Battery (12V) ──── [R1 = 10k] ──┬── [R2 = 3.3k] ──── GND
                                   │
                              ADC Pin (reads here)

The voltage at the ADC pin is:

V_adc = V_battery x R2 / (R1 + R2)
V_adc = 12V x 3300 / (10000 + 3300)
V_adc = 12V x 0.248
V_adc = 2.98V    (safely under 3.3V)

To convert back from ADC reading to battery voltage:

V_battery = V_adc x (R1 + R2) / R2
V_battery = V_adc x 4.03

Here is the code:

#![allow(unused)]
fn main() {
const DIVIDER_RATIO: f32 = (10_000.0 + 3_300.0) / 3_300.0; // 4.03

loop {
    let raw = adc.blocking_read(&mut battery_pin);
    let v_adc = (raw as f32 / 4095.0) * 3.3;
    let v_battery = v_adc * DIVIDER_RATIO;

    defmt::info!("Battery: {:.2}V", v_battery);

    if v_battery < 10.5 {
        defmt::warn!("LOW BATTERY! {:.2}V", v_battery);
        // Trigger low-battery alarm, land the drone, etc.
    }

    Timer::after_millis(500).await;
}
}

💡 Fun Fact: The resistor values 10k and 3.3k are chosen because 3.3k is a standard E24 series value and the ratio gives a comfortable margin below the 3.3V maximum. For a 4S LiPo (16.8V max), you might use 10k + 2.2k instead, giving V_adc = 16.8 x 0.18 = 3.03V.

Practical: Reading a Potentiometer

A potentiometer (knob) is the simplest analog input — it outputs a voltage between 0V and 3.3V depending on its position. Wire the three pins: one outer leg to 3.3V, the other to GND, and the middle (wiper) to an ADC pin.

#![allow(unused)]
fn main() {
loop {
    let raw = adc.blocking_read(&mut pot_pin);

    // Map to a 0-100% range
    let percent = (raw as f32 / 4095.0) * 100.0;

    // Or map to a servo angle (0-180 degrees)
    let angle = (raw as f32 / 4095.0) * 180.0;

    defmt::info!("Pot: {:.1}%  Servo: {:.1} deg", percent, angle);
    Timer::after_millis(50).await;
}
}

Noise and Accuracy

Raw ADC readings are noisy. You will see the value jitter by a few counts even when the input voltage is perfectly stable. Here are common techniques to deal with this:

Averaging: Read multiple samples and average them.

#![allow(unused)]
fn main() {
fn read_averaged(adc: &mut Adc, pin: &mut impl AdcPin, samples: u32) -> u16 {
    let mut sum: u32 = 0;
    for _ in 0..samples {
        sum += adc.blocking_read(pin) as u32;
    }
    (sum / samples) as u16
}

let stable_reading = read_averaged(&mut adc, &mut battery_pin, 16);
}

Decoupling capacitor: Place a 100nF ceramic capacitor between the ADC pin and GND, as close to the MCU as possible. This filters high-frequency noise before it reaches the ADC.

Reference voltage: The ADC's accuracy is only as good as its reference voltage. Most STM32s use VDDA (analog supply voltage) as the reference. If VDDA is noisy, all your readings will be noisy. Use proper decoupling on the VDDA pin.

🧠 Think About It: If you average 16 samples, you reduce random noise by a factor of 4 (the square root of 16). This is called oversampling. Some STM32 ADCs have hardware oversampling built in — check if your chip supports it before writing your own averaging loop.

Summary

The ADC is your bridge between the analog real world and your digital code. The core idea is simple — voltage in, number out. But the details matter: resolution determines your precision, sampling time affects accuracy, voltage dividers let you measure higher voltages safely, and averaging tames noise.

For most projects, 12-bit resolution at 400 kHz sampling rate with a simple software average is more than enough. Do not overthink it — read the pin, convert to voltage, and move on. You can always add sophistication later.

Next chapter, we will look at DMA — the hardware feature that lets your ADC (and UART, and SPI) run autonomously without bothering the CPU.

DMA — Direct Memory Access

Imagine you are a chef in a busy kitchen. You need ingredients from the pantry. You could walk to the pantry yourself, grab each item one at a time, and walk back — but then you are not cooking. Now imagine you have a sous-chef whose only job is to fetch ingredients and put them on your counter. You just say "get me flour, eggs, and butter" and keep cooking. That sous-chef is DMA.

Why DMA Matters

Without DMA, the CPU handles every single byte of every transfer. When you receive 100 bytes over UART at 115200 baud, the CPU must:

  1. Get interrupted for each byte
  2. Read the byte from the peripheral register
  3. Store it in a buffer in memory
  4. Go back to whatever it was doing
  5. Repeat 99 more times

That is 100 interrupts in about 9 milliseconds. The CPU can do it, but it is wasting time on grunt work when it could be computing sensor fusion or running a PID loop.

With DMA, the conversation goes like this:

  1. CPU tells the DMA controller: "Copy bytes from UART data register to this buffer. Let me know when you have 100 bytes."
  2. CPU goes back to computing.
  3. DMA silently shuttles bytes from the peripheral to memory, one at a time, without involving the CPU at all.
  4. When the transfer is complete, DMA fires a single interrupt: "All done."

The CPU did zero work during the transfer. It was free to run your control loop, process sensor data, or even sleep.

💡 Fun Fact: DMA is not unique to microcontrollers. Your PC's GPU uses DMA to read textures from RAM. Your SSD controller uses DMA to transfer data. Network cards use DMA. Any time a peripheral needs to move large amounts of data, DMA is the answer. The concept dates back to the UNIVAC I computer in 1951.

How DMA Works (Conceptually)

A DMA controller is a simple hardware unit that can copy data between two locations — typically between a peripheral's data register and a memory buffer. You configure it with:

ParameterDescription
SourceWhere to read from (e.g., UART data register)
DestinationWhere to write to (e.g., a buffer in RAM)
Transfer countHow many bytes (or half-words, or words) to move
DirectionPeripheral-to-memory, memory-to-peripheral, or memory-to-memory
IncrementWhether to increment the source/destination address after each transfer

For a UART receive, you would configure: source = UART data register (fixed address, no increment), destination = your buffer (incrementing address), count = buffer size.

The DMA controller then watches the peripheral. Every time the UART receives a byte, DMA grabs it and puts it in the next slot in your buffer. No CPU involvement at all.

Embassy Makes DMA Easy

Here is the beautiful thing about Embassy: you have already been using DMA. When you passed DMA channels to Uart::new or Spi::new in the previous chapters, you were enabling DMA transfers. The .await on a read or write operation sets up a DMA transfer, suspends your task, and wakes it up when the transfer completes.

#![allow(unused)]
fn main() {
// This UART transfer uses DMA — the CPU is free while data arrives
let mut buf = [0u8; 256];
let n = uart.read_until_idle(&mut buf).await.unwrap();
// ^ CPU was doing other things during this entire transfer
}

Compare with a blocking (non-DMA) approach:

#![allow(unused)]
fn main() {
// This would block the CPU for the entire transfer
let mut buf = [0u8; 256];
for i in 0..256 {
    buf[i] = uart.blocking_read(); // CPU waits for EACH byte
}
}

The same applies to SPI:

#![allow(unused)]
fn main() {
// DMA transfer — CPU is free
let mut rx = [0u8; 1024];
spi.transfer(&mut rx, &tx).await.unwrap();

// Without DMA, the CPU would bit-bang 1024 bytes
// At 10 MHz SPI, that is only ~100 microseconds,
// but those are microseconds your control loop might need.
}

Passing DMA Channels

In Embassy, you assign DMA channels when creating a peripheral. Each peripheral transfer (TX and RX) needs its own DMA channel:

#![allow(unused)]
fn main() {
// UART with DMA
let uart = Uart::new(
    p.USART2,
    p.PA3, p.PA2,
    Irqs,
    p.DMA1_CH6,   // TX DMA channel
    p.DMA1_CH5,   // RX DMA channel
    config,
).unwrap();

// SPI with DMA
let spi = Spi::new(
    p.SPI1,
    p.PA5, p.PA7, p.PA6,
    p.DMA2_CH3,   // TX DMA channel
    p.DMA2_CH2,   // RX DMA channel
    spi_config,
);

// I2C with DMA
let i2c = I2c::new(
    p.I2C1,
    p.PB6, p.PB7,
    Irqs,
    p.DMA1_CH6,   // TX DMA channel
    p.DMA1_CH0,   // RX DMA channel
    Hertz(400_000),
    Default::default(),
);
}

🧠 Think About It: If DMA is strictly better, why does Embassy even offer blocking_read and blocking_write? Think about single-byte transfers. Setting up a DMA transfer has overhead (configuring registers, handling the interrupt). For a 1-byte transfer, the DMA setup time can exceed the time it takes the CPU to just do the transfer itself.

The H743 Memory Trap

This section is critical if you are using an STM32H7 or STM32F7. If you are on an F1, F4, or G0, feel free to skip it — but read it anyway, because you will encounter H7 boards eventually.

The STM32H743 has a complicated memory map. The Cortex-M7 core has tightly-coupled memory (TCM) for maximum performance:

Memory RegionAddress RangeSize (H743)CPU AccessDMA Access
DTCM0x2000_0000128 KBFast (0 wait)NO
ITCM0x0000_000064 KBFast (0 wait)NO
AXI SRAM0x2400_0000512 KBNormalYes
SRAM10x3000_0000128 KBNormalYes
SRAM20x3002_0000128 KBNormalYes
SRAM30x3004_000032 KBNormalYes
SRAM40x3800_000064 KBNormalYes

Here is the trap: by default, the stack lives in DTCM. And DMA cannot access DTCM. So if you declare a buffer on the stack and pass it to a DMA transfer, the DMA controller literally cannot read or write that memory. The transfer silently fails or produces corrupted data. No error. No panic. Just wrong data.

#![allow(unused)]
fn main() {
// THIS WILL SILENTLY FAIL ON H743
async fn broken_read(uart: &mut Uart<'_, Async>) {
    let mut buf = [0u8; 64]; // <-- This is on the stack, which is in DTCM
    uart.read_until_idle(&mut buf).await.unwrap(); // DMA cannot write here!
}
}

The Fix: Place DMA Buffers in Accessible SRAM

Use the #[link_section] attribute to place your buffer in a SRAM region that DMA can access:

#![allow(unused)]
fn main() {
#[link_section = ".sram1"]
static mut DMA_BUF: [u8; 256] = [0u8; 256];

// Or for a more Rust-idiomatic approach with Embassy:
#[link_section = ".axisram"]
static DMA_BUF: embassy_sync::mutex::Mutex<
    embassy_sync::blocking_mutex::raw::CriticalSectionRawMutex,
    [u8; 256],
> = embassy_sync::mutex::Mutex::new([0u8; 256]);
}

You also need to define these memory sections in your linker script (memory.x):

MEMORY
{
    FLASH  : ORIGIN = 0x08000000, LENGTH = 2M
    DTCM   : ORIGIN = 0x20000000, LENGTH = 128K
    RAM    : ORIGIN = 0x24000000, LENGTH = 512K
    SRAM1  : ORIGIN = 0x30000000, LENGTH = 128K
    SRAM2  : ORIGIN = 0x30020000, LENGTH = 128K
}

Warning: This is the single most common source of mysterious bugs on the STM32H7. If your DMA transfers return all zeros, random data, or seem to "work sometimes," check your buffer placement first.

Cache Coherency (H7/F7 Only)

The Cortex-M7 core in the H7 and F7 has a data cache. This means the CPU does not always read directly from RAM — it reads from a fast local copy. This creates a problem with DMA:

  1. CPU writes data to a buffer (goes into the cache, not necessarily to RAM)
  2. DMA reads that buffer from RAM (sees stale data — the CPU's writes are still in the cache)
  3. DMA sends garbage to the peripheral

Or the reverse:

  1. DMA writes received data to a buffer in RAM
  2. CPU reads the buffer (gets stale data from the cache, not the fresh DMA data)

The simplest fix is to place DMA buffers in a non-cacheable memory region. SRAM1 through SRAM4 on the H743 can be configured as non-cacheable using the MPU (Memory Protection Unit).

#![allow(unused)]
fn main() {
// In your Embassy H7 configuration, you typically configure the MPU
// to mark SRAM regions as non-cacheable for DMA use.
// Embassy's H7 examples include this setup — follow them closely.
}

Alternatively, you can manually invalidate/clean the cache before and after DMA transfers, but this is error-prone. Using non-cacheable regions is the safer approach.

💡 Fun Fact: Cache coherency is the same problem that plagues multi-core CPUs in desktop computers. The H7's M7 core + DMA controller is essentially a tiny multi-core system. Enterprise CPUs solve this with hardware cache coherency protocols (like MESI). The Cortex-M7 does not have this, so you have to manage it yourself.

When DMA Matters Most

Not every transfer needs DMA. Here is a rough guide:

ScenarioDMA Worth It?Why
UART debug printsMaybeFrees CPU, but small transfers
UART GPS stream (continuous)YesConstant data flow while CPU computes
SPI sensor read (10 bytes)NoSetup overhead exceeds transfer time
SPI display update (64 KB)YesHuge transfer, CPU is blocked for ages
ADC continuous samplingYesHundreds of samples per second
I2C sensor read (6 bytes)NoTiny transfer, barely worth the setup

The general rule: use DMA when the transfer is large or continuous, and the CPU has better things to do.

Summary

DMA is your CPU's best friend. It offloads the boring work of moving bytes from point A to point B, leaving the CPU free to do actual computation. In Embassy, you get DMA almost for free — just pass DMA channels to your peripheral constructors, and every .await on a transfer uses DMA automatically.

The one gotcha to remember: on H7 and F7, DMA cannot access DTCM memory (where the stack lives by default), and the data cache can cause coherency issues. Place DMA buffers in SRAM1 or another DMA-accessible, non-cacheable region.

Next up: the watchdog timer — your firmware's safety net for when things go wrong.

The Watchdog

Your firmware has a bug. It will always have a bug. Maybe a sensor returns unexpected data that triggers an infinite loop. Maybe a pointer goes somewhere it should not and the whole thing locks up. Maybe a cosmic ray flips a bit in RAM (yes, this actually happens). When your firmware hangs, what happens to the hardware it controls?

If your firmware is blinking an LED, nothing bad happens. If your firmware is controlling a motor, a heater, or a drone's propellers — a firmware hang means the last command keeps executing forever. The motor runs at full speed. The heater stays on. The drone flies into a wall.

This is why watchdogs exist.

What Is a Watchdog?

A watchdog timer is a hardware countdown timer. You start it, and it begins counting down. Before it reaches zero, you must "pet" (or "kick" or "feed") the watchdog — which resets the countdown. If your firmware hangs and fails to pet the watchdog in time, the countdown reaches zero and the watchdog resets the entire MCU.

  Start       Pet       Pet       Pet       CRASH (no pet)
    │          │          │          │          │
    ▼          ▼          ▼          ▼          ▼
    [████████] [████████] [████████] [████████] [████░░░░] --> RESET!
    Counting   Restarted  Restarted  Restarted  Timed out

It is a dead man's switch. As long as your firmware is healthy and running its main loop, it pets the watchdog and everything is fine. The moment it hangs, the watchdog notices and reboots the system.

💡 Fun Fact: The term "watchdog" comes from a real watchdog — a dog that barks if an intruder enters. In computing, the concept dates back to the 1960s when NASA used hardware watchdog circuits on early spacecraft. If the computer froze, the watchdog circuit would trigger a hardware reset. The Mars rovers use watchdog timers too.

Two Types of Watchdog

STM32 microcontrollers have two independent watchdog peripherals:

Independent Watchdog (IWDG)

The IWDG is the simple, reliable workhorse. It runs on its own independent clock — the LSI (Low-Speed Internal) oscillator, typically 32 kHz. This means:

  • It works even if the main system clock fails
  • It works even if the PLL crashes
  • It works even in low-power modes
  • It is available on every single STM32 ever made

The IWDG is simple: configure a timeout period, start it, and pet it before the timeout. That is it.

FeatureDetail
Clock sourceLSI (~32 kHz, varies by chip)
Timeout range~125 microseconds to ~32 seconds
Can be stopped?No — once started, it cannot be stopped (on most STM32s)
Survives clock failure?Yes
ComplexityVery low

Warning: On most STM32 families, once you start the IWDG, you cannot stop it. It will keep counting until the MCU is power-cycled. This is by design — if malicious or buggy code could disable the watchdog, it would defeat the purpose.

Window Watchdog (WWDG)

The WWDG is the IWDG's fussier cousin. Instead of just requiring a pet before a deadline, it requires the pet to happen within a specific time window — not too early and not too late.

  ┌──────────────────────────────────────────────────┐
  │  TOO EARLY          WINDOW            TOO LATE   │
  │  (pet = reset)      (pet = OK)        (= reset)  │
  │  ████████████        ░░░░░░░░          ██████████ │
  └──────────────────────────────────────────────────┘

Why would you want this? It catches a different class of bugs. The IWDG only catches a full hang. The WWDG also catches:

  • Code running too fast (skipping important work)
  • A tight infinite loop that accidentally pets the watchdog each iteration
  • Timing violations that indicate corrupted control flow
FeatureIWDGWWDG
Pet timingBefore deadlineWithin a window
Clock sourceIndependent LSIAPB1 clock
Survives clock failure?YesNo (needs APB1)
Can detect "too fast"?NoYes
ComplexityLowMedium

For most projects, the IWDG is all you need. The WWDG is for safety-critical systems where you need to verify your control loop is running at the correct rate.

🧠 Think About It: Imagine firmware that has a bug causing it to skip sensor reads but still pet the watchdog. The IWDG would not catch this — the firmware is still running, just incorrectly. How would you design your petting strategy so the IWDG catches this too? (Hint: only pet the watchdog after successfully completing all critical tasks.)

IWDG in Embassy

Basic Setup

use embassy_stm32::wdg::IndependentWatchdog;

#[embassy_executor::main]
async fn main(_spawner: Spawner) {
    let p = embassy_stm32::init(Default::default());

    // Create a watchdog with a 2-second timeout
    let mut watchdog = IndependentWatchdog::new(p.IWDG, 2_000_000); // microseconds

    // Start the watchdog — no going back after this!
    watchdog.unleash();

    loop {
        // Do your work
        do_important_stuff().await;

        // Pet the watchdog — must happen within 2 seconds
        watchdog.pet();
    }
}

Choosing the Timeout

The timeout should be long enough that your firmware can always pet in time during normal operation, but short enough that a hang is detected quickly.

TimeoutGood For
100 msFast control loops (motor control, flight controllers)
500 msMedium-speed systems (robotics, sensor hubs)
1 - 2 sGeneral purpose, forgiving
5+ sSystems with long processing tasks

A good starting point: set the timeout to 3 to 5 times your main loop period. If your loop runs at 100 Hz (10 ms), a 50 ms timeout gives plenty of margin.

Practical: Watchdog with Task Architecture

In a real Embassy application, you often have multiple tasks. The question is: where do you pet the watchdog? If you pet it in one task, it does not tell you whether the other tasks are healthy.

Here is a pattern: each task sets a "heartbeat" flag, and a supervisor task only pets the watchdog if all flags are set.

use core::sync::atomic::{AtomicBool, Ordering};

static SENSOR_OK: AtomicBool = AtomicBool::new(false);
static CONTROL_OK: AtomicBool = AtomicBool::new(false);
static COMMS_OK: AtomicBool = AtomicBool::new(false);

#[embassy_executor::task]
async fn sensor_task() {
    loop {
        read_sensors().await;
        SENSOR_OK.store(true, Ordering::Relaxed);
        Timer::after_millis(10).await;
    }
}

#[embassy_executor::task]
async fn control_task() {
    loop {
        run_pid_loop().await;
        CONTROL_OK.store(true, Ordering::Relaxed);
        Timer::after_millis(10).await;
    }
}

#[embassy_executor::task]
async fn comms_task() {
    loop {
        send_telemetry().await;
        COMMS_OK.store(true, Ordering::Relaxed);
        Timer::after_millis(100).await;
    }
}

#[embassy_executor::main]
async fn main(spawner: Spawner) {
    let p = embassy_stm32::init(Default::default());

    let mut watchdog = IndependentWatchdog::new(p.IWDG, 1_000_000);

    spawner.spawn(sensor_task()).unwrap();
    spawner.spawn(control_task()).unwrap();
    spawner.spawn(comms_task()).unwrap();

    watchdog.unleash();

    loop {
        Timer::after_millis(200).await;

        let all_ok = SENSOR_OK.load(Ordering::Relaxed)
            && CONTROL_OK.load(Ordering::Relaxed)
            && COMMS_OK.load(Ordering::Relaxed);

        if all_ok {
            watchdog.pet();

            // Reset all flags — tasks must set them again
            SENSOR_OK.store(false, Ordering::Relaxed);
            CONTROL_OK.store(false, Ordering::Relaxed);
            COMMS_OK.store(false, Ordering::Relaxed);
        }
        // If any task has stalled, we do NOT pet, and the watchdog resets us
    }
}

This pattern ensures that every critical task must be running correctly for the watchdog to get fed. If the sensor task hangs, or the control task panics, or the comms task gets stuck — the watchdog catches it.

💡 Fun Fact: This "multi-task heartbeat" pattern is standard practice in aerospace and automotive firmware. The MISRA-C guidelines (used in car ECU development) recommend exactly this approach. Safety-critical systems often have multiple watchdog layers — a software watchdog feeding a hardware watchdog feeding an external watchdog IC.

When to Use a Watchdog

Always use a watchdog if your system controls physical actuators.

ApplicationWatchdog Needed?Consequence of Hang
LED blinkerNice to haveLED stuck on or off
Data loggerRecommendedMissed data, corrupt files
Motor controllerEssentialMotor runs uncontrolled
Heater controlEssentialFire hazard
Drone flight controllerEssentialCrash
Medical deviceEssential + redundantPatient safety risk

Even for non-safety-critical systems, a watchdog is good practice. A data logger that resets and resumes logging after a crash is infinitely better than one that hangs silently and logs nothing.

What Happens After a Watchdog Reset?

When the watchdog triggers, the MCU resets as if you pressed the reset button. Your firmware starts from the top of main. But you might want to know why you reset — was it a power-on or a watchdog timeout?

STM32 has reset status flags that tell you:

#![allow(unused)]
fn main() {
use embassy_stm32::pac;

// Check reset cause (read RCC reset status register)
// The exact register and bit names vary by STM32 family
let rcc = pac::RCC;
let csr = rcc.csr().read();

if csr.iwdgrstf() {
    defmt::warn!("RESET CAUSE: Independent Watchdog timeout!");
    // Log this, increment a crash counter, enter safe mode, etc.
}

if csr.wwdgrstf() {
    defmt::warn!("RESET CAUSE: Window Watchdog timeout!");
}

// Clear the reset flags so they do not persist after the next reset
rcc.csr().modify(|w| w.set_rmvf(true));
}

🧠 Think About It: After a watchdog reset, should your firmware immediately resume normal operation? Or should it enter a "safe mode" first? For a drone, you might want to enter a controlled descent instead of resuming the last flight command. Think about what "safe" means for your specific application.

Summary

The watchdog is your firmware's safety net. It is the simplest peripheral on the chip — just a countdown timer — but it might be the most important one. A firmware hang without a watchdog is an uncontrolled system. A firmware hang with a watchdog is a system reset followed by a clean recovery.

Use the IWDG for most projects. Start it early, pet it in your main loop (or better, use the multi-task heartbeat pattern), and choose a timeout that balances responsiveness with margin. Once the watchdog is unleashed, it cannot be stopped — and that is exactly the point.

With this chapter, you have covered all the core peripherals of the STM32. You can blink LEDs, read sensors over UART, SPI, and I2C, measure analog voltages, move data efficiently with DMA, and keep your system safe with a watchdog. In the next part of the book, we will go deeper — into memory architecture, embedded Rust patterns, and building complete real-world projects.

Memory Architecture

If you have been following along on an STM32F4 or G0, you may not have thought about memory at all. That is by design — simpler chips keep things simple. But once you reach the STM32H7, memory becomes something you actively manage. This chapter explains why, and gives you the mental model to handle it.

The Simple World: F1, F4, G0

On most STM32 chips, memory looks like this:

RegionTypical SizeAddress StartNotes
Flash64KB–1MB0x0800_0000Program code lives here
SRAM20KB–128KB0x2000_0000Variables, stack, heap

Everything is on one bus. The CPU can read Flash and SRAM. The DMA controller can read and write SRAM. Peripherals talk to DMA, DMA talks to SRAM, the CPU reads SRAM. No caches, no tricks, no surprises.

CPU ──────┐
          ├──── Bus Matrix ──── Flash
DMA ──────┤                ──── SRAM
          ├──── Peripherals

Your linker script puts .text in Flash, .bss and .data in SRAM, and you never think about it again. This is the world most tutorials assume.

Think About It: If you have only worked with Arduino or simple STM32 boards, you have been living in this simple world the entire time — and that is perfectly fine for most projects.

The F4 Trap: CCM RAM

The STM32F4 series has a sneaky extra region called CCM (Core Coupled Memory). On the F407, it is 64KB sitting at address 0x1000_0000.

RegionSizeDMA AccessCPU AccessSpeed
Main SRAM128KBYesYesFast
CCM RAM64KBNoYesFastest

CCM is directly wired to the CPU with zero wait states — it is the fastest RAM on the chip. But DMA cannot touch it. If you place a UART transmit buffer in CCM and then ask DMA to send it, nothing happens. No error, no crash — DMA just silently reads zeros or old data.

#![allow(unused)]
fn main() {
// DANGER: If this buffer lands in CCM, DMA transfers will silently fail
static mut TX_BUF: [u8; 256] = [0u8; 256];
}

The fix is straightforward: do not put DMA buffers in CCM. Use CCM for stack space or computation scratch pads where only the CPU reads and writes.

Fun Fact: Many developers have lost hours debugging "DMA not working" on F4 chips, only to discover their buffer was allocated in CCM. If DMA output looks like garbage, check your memory map first.

The Complex World: STM32H7

The H7 is a different beast. It has seven distinct RAM regions, each with different properties:

RegionSizeAddressCPUDMACacheBest For
ITCM64KB0x0000_0000YesNoNoFastest code execution
DTCM128KB0x2000_0000YesNoNoFastest data, control variables
AXI SRAM512KB0x2400_0000YesYesYesLarge general-purpose buffers
SRAM1128KB0x3000_0000YesYesNoDMA buffers
SRAM2128KB0x3002_0000YesYesNoDMA buffers
SRAM332KB0x3004_0000YesYesNoDMA buffers
SRAM464KB0x3800_0000YesYesNoBackup domain, low-power

That is over 1MB of RAM total, but you cannot just treat it as one big pool.

Why So Many Regions?

Each region connects to a different bus inside the chip. DTCM is wired straight to the Cortex-M7 core — zero wait states, no contention. AXI SRAM goes through the AXI bus and the L1 cache. SRAM1-3 sit on the AHB bus where DMA controllers live.

Cortex-M7 Core
  ├── ITCM (instruction tightly-coupled memory)
  ├── DTCM (data tightly-coupled memory)
  ├── L1 Cache ──── AXI Bus ──── AXI SRAM (512KB)
  └── AHB Bus ──── SRAM1/2/3 ──── DMA1, DMA2
                               ──── SRAM4 (backup domain)

Which Memory for What

Here is the practical decision tree:

DTCM (128KB) — Use for variables the CPU accesses constantly: PID loop state, filter coefficients, control flags. It is the fastest data memory but DMA cannot reach it.

SRAM1/2/3 (288KB total) — Use for any buffer that DMA touches: UART RX/TX buffers, SPI transfer buffers, ADC sample arrays. These regions are not cached, so you avoid the cache coherency headaches.

AXI SRAM (512KB) — Use for large data structures that the CPU works with: sensor log arrays, lookup tables, image processing buffers. It goes through the cache, so it is fast for sequential CPU access but needs cache maintenance if DMA also touches it.

SRAM4 (64KB) — Special region in the backup power domain. Use it for data that must survive a reset or data shared with low-power modes.

#![allow(unused)]
fn main() {
// In your linker script or with #[link_section]:

// Control variables — fast CPU access, no DMA needed
#[link_section = ".dtcm_bss"]
static mut PID_STATE: PidState = PidState::new();

// DMA buffer — must be in SRAM1, not DTCM or AXI
#[link_section = ".sram1_bss"]
static mut UART_TX_BUF: [u8; 512] = [0u8; 512];

// Large working buffer — cached AXI SRAM is fine
#[link_section = ".axisram_bss"]
static mut SENSOR_LOG: [SensorReading; 4096] = [SensorReading::ZERO; 4096];
}

Linker Script Basics

The linker script is the file that tells the linker where each section of your program goes in physical memory. On simple chips, the default memory.x from cortex-m-rt works fine. On the H7, you need to customize it.

A minimal memory.x for the STM32H743:

MEMORY
{
  FLASH  : ORIGIN = 0x08000000, LENGTH = 2M
  DTCM   : ORIGIN = 0x20000000, LENGTH = 128K
  AXISRAM : ORIGIN = 0x24000000, LENGTH = 512K
  SRAM1  : ORIGIN = 0x30000000, LENGTH = 128K
  SRAM2  : ORIGIN = 0x30020000, LENGTH = 128K
  SRAM3  : ORIGIN = 0x30040000, LENGTH = 32K
  SRAM4  : ORIGIN = 0x38000000, LENGTH = 64K
  ITCM   : ORIGIN = 0x00000000, LENGTH = 64K
}

SECTIONS
{
  .sram1_bss (NOLOAD) : {
    *(.sram1_bss .sram1_bss.*);
  } > SRAM1

  .axisram_bss (NOLOAD) : {
    *(.axisram_bss .axisram_bss.*);
  } > AXISRAM

  .dtcm_bss (NOLOAD) : {
    *(.dtcm_bss .dtcm_bss.*);
  } > DTCM
}

The default .bss and .data sections (your normal global variables and the stack) go into whichever region you map RAM to. On the H7, Embassy typically maps RAM to DTCM for maximum speed.

Think About It: The linker script is not magic — it is just a mapping from section names to address ranges. When you write #[link_section = ".sram1_bss"], you are telling the compiler "put this variable in the section called .sram1_bss," and the linker script says "that section lives in SRAM1 at address 0x3000_0000."

Cache Coherency — The H7 Gotcha

AXI SRAM goes through the Cortex-M7's L1 data cache. This means the CPU might be reading a cached copy of memory while DMA writes new data to the actual SRAM. They get out of sync.

Two solutions:

  1. Avoid the problem: Put DMA buffers in SRAM1-3 (not cached). This is the simplest approach and what Embassy does by default for its DMA buffers.

  2. Manage the cache: Invalidate the cache before reading DMA results, clean the cache before starting a DMA write. This is error-prone and rarely worth the complexity.

#![allow(unused)]
fn main() {
// The easy way: just use the right memory region
#[link_section = ".sram1_bss"]
static mut ADC_DMA_BUF: [u16; 256] = [0u16; 256];
// No cache worries — SRAM1 is not cached
}

Practical Advice

For F0, G0, F1, F3, L0, L4, G4 — do not worry about any of this. You have one RAM, everything works with DMA, there is no cache. Use the default linker script and move on.

For F4 — be aware of CCM. If DMA mysteriously fails, check whether your buffer is in CCM. Otherwise, treat it like the simple chips.

For H7 — take 30 minutes to set up your linker script correctly at the start of the project. Put DMA buffers in SRAM1. Put your stack and hot variables in DTCM. Use AXI SRAM for large CPU-only data. Then you can stop thinking about it for the rest of the project.

Fun Fact: The STM32H7's total internal RAM (over 1MB) is larger than the entire Flash memory of many STM32F0 chips. Memory architecture reflects the enormous range of complexity in the STM32 family.

Summary

Chip FamilyMemory ModelKey Concern
F0, G0, L0One RAM, simpleNone
F1, F3, L4, G4One RAM, simpleNone
F4Main SRAM + CCMCCM is CPU-only, no DMA
F7Multiple SRAM + cacheCache coherency with DMA
H77 RAM regions + cacheRegion placement for DMA, cache coherency

Start simple. Graduate to complexity only when your chip demands it.

Embedded Rust Patterns

Writing Rust for a microcontroller is not quite the same as writing Rust for a desktop application. You have no operating system, no allocator (usually), and every byte of RAM counts. This chapter covers the patterns that experienced embedded Rust developers reach for again and again.

No Heap: The heapless Crate

On a microcontroller, calling malloc is risky. Heap fragmentation can cause allocation failures hours or days into a mission — the worst kind of bug. The standard Vec, String, and VecDeque all use the heap. The heapless crate gives you fixed-capacity alternatives that live entirely on the stack or in static memory.

#![allow(unused)]
fn main() {
use heapless::Vec;
use heapless::String;

// A Vec that holds at most 64 readings — no heap allocation
let mut readings: Vec<f32, 64> = Vec::new();

readings.push(23.5).ok();  // Returns Err if full
readings.push(24.1).ok();

// A String with a maximum of 128 bytes
let mut msg: String<128> = String::new();
core::fmt::write(&mut msg, format_args!("Temp: {:.1}C", readings[0])).ok();
}

The capacity is part of the type. Vec<f32, 64> and Vec<f32, 128> are different types. The compiler knows exactly how much memory each one needs, and it is allocated at compile time.

TypeHeap VersionHeapless VersionNotes
Growable arrayVec<T>Vec<T, N>Fixed max capacity N
StringStringString<N>N is max byte length
QueueVecDeque<T>Deque<T, N>Ring buffer, fixed size
MapHashMap<K,V>LinearMap<K,V,N>Linear search, small N only
Producer/Consumermpsc::channelspsc::Queue<T, N>Lock-free, single producer/consumer

Fun Fact: The heapless crate's spsc::Queue is interrupt-safe without disabling interrupts. It uses atomic operations to let one context produce and another consume without locks. This is exactly what you need for passing data from an interrupt handler to a main loop.

Handle Every Error

In desktop Rust, calling .unwrap() on a Result panics with a nice error message and stack trace. On a microcontroller, a panic usually means the chip resets or hangs. There is no terminal to print a stack trace to (unless you set up defmt).

#![allow(unused)]
fn main() {
// BAD: panic if the sensor read fails
let temperature = sensor.read_temperature().unwrap();

// GOOD: handle the error gracefully
let temperature = match sensor.read_temperature() {
    Ok(t) => t,
    Err(_) => {
        defmt::warn!("Sensor read failed, using last known value");
        last_known_temperature
    }
};
}

Common embedded error handling strategies:

Fallback value — Use the last known good value. Good for telemetry where a stale reading is better than no reading.

Retry with backoff — Try again after a delay. Good for communication interfaces that might be temporarily busy.

Degrade gracefully — Switch to a simpler mode of operation. If the barometer fails, continue flying with GPS altitude only.

Reset the peripheral — Some peripherals get into bad states. Reinitializing the SPI bus or the sensor can clear the fault.

#![allow(unused)]
fn main() {
// Retry pattern with a maximum attempt count
async fn read_with_retry(sensor: &mut Bmp280<I2c>) -> Option<f32> {
    for attempt in 0..3 {
        match sensor.read_pressure().await {
            Ok(p) => return Some(p),
            Err(e) => {
                defmt::warn!("Read attempt {} failed: {:?}", attempt, e);
                Timer::after_millis(10).await;
            }
        }
    }
    defmt::error!("Sensor read failed after 3 attempts");
    None
}
}

Think About It: In a safety-critical system, "what happens when this fails?" is the most important question you can ask about every single line of code. The Rust compiler forces you to answer it for every Result — use that to your advantage.

State Machines with Enums

Embedded systems are full of state machines. A motor controller might be in Idle, Spinning Up, Running, or Fault. A communication protocol might be WaitingForHeader, ReceivingPayload, or ProcessingMessage. Rust enums model these perfectly.

#![allow(unused)]
fn main() {
#[derive(Debug, Clone, Copy, PartialEq, defmt::Format)]
enum FlightMode {
    Disarmed,
    Armed,
    Stabilize { throttle: f32 },
    ReturnToHome { target_lat: f32, target_lon: f32 },
    Emergency,
}

fn next_mode(current: FlightMode, event: Event) -> FlightMode {
    match (current, event) {
        (FlightMode::Disarmed, Event::ArmSwitch) => FlightMode::Armed,
        (FlightMode::Armed, Event::ThrottleUp(t)) => {
            FlightMode::Stabilize { throttle: t }
        }
        (_, Event::LowBattery) => FlightMode::Emergency,
        (_, Event::DisarmSwitch) => FlightMode::Disarmed,
        (state, _) => state, // Ignore unhandled events
    }
}
}

The compiler guarantees exhaustive matching. If you add a new variant to FlightMode, every match statement that does not handle it will fail to compile. This is enormously valuable — it means you cannot accidentally forget to handle a new state.

The Newtype Pattern for Units

Mixing up units is a classic embedded bug. Is that angle in degrees or radians? Is that distance in meters or centimeters? The newtype pattern uses Rust's type system to make unit confusion a compile-time error.

#![allow(unused)]
fn main() {
#[derive(Debug, Clone, Copy)]
struct Meters(f32);

#[derive(Debug, Clone, Copy)]
struct Degrees(f32);

#[derive(Debug, Clone, Copy)]
struct Radians(f32);

impl Degrees {
    fn to_radians(self) -> Radians {
        Radians(self.0 * core::f32::consts::PI / 180.0)
    }
}

fn set_heading(heading: Degrees) {
    // Unambiguous — this function takes degrees, not radians
    let rad = heading.to_radians();
    // ...
}

// This will NOT compile:
// set_heading(Meters(100.0));  // Error: expected Degrees, found Meters
}

This costs nothing at runtime. The compiler strips away the wrapper type and works directly with the inner f32. You get type safety for free.

Fun Fact: The Mars Climate Orbiter was lost in 1999 because one software module produced thrust values in pound-force seconds while another expected newton-seconds. A newtype pattern would have caught this at compile time.

Portable Drivers with embedded-hal

The embedded-hal crate defines traits for common hardware interfaces: SpiDevice, I2c, InputPin, OutputPin, and so on. If you write a sensor driver that is generic over these traits, it works on any microcontroller that implements them — STM32, nRF52, ESP32, RP2040.

#![allow(unused)]
fn main() {
use embedded_hal::i2c::I2c;

/// A BMP280 barometer driver that works on any platform
pub struct Bmp280<I2C> {
    i2c: I2C,
    address: u8,
}

impl<I2C: I2c> Bmp280<I2C> {
    pub fn new(i2c: I2C, address: u8) -> Self {
        Self { i2c, address }
    }

    pub fn read_pressure(&mut self) -> Result<f32, I2C::Error> {
        let mut buf = [0u8; 3];
        self.i2c.write_read(self.address, &[0xF7], &mut buf)?;
        let raw = ((buf[0] as u32) << 12)
                | ((buf[1] as u32) << 4)
                | ((buf[2] as u32) >> 4);
        Ok(self.compensate_pressure(raw))
    }

    fn compensate_pressure(&self, raw: u32) -> f32 {
        // Compensation algorithm from BMP280 datasheet
        // (simplified for illustration)
        raw as f32 / 256.0
    }
}
}

This driver compiles for thumbv7em-none-eabihf (STM32) and thumbv6m-none-eabi (RP2040) and riscv32imc-unknown-none-elf (ESP32-C3) without changing a single line.

Zero-Cost Abstractions

Rust's generics and trait system are monomorphized at compile time. When you write a function generic over a trait, the compiler generates a specialized version for each concrete type. There is no vtable lookup, no dynamic dispatch, no overhead.

#![allow(unused)]
fn main() {
// This generic function:
fn read_sensor<S: I2c>(spi: &mut S) -> u16 {
    let mut buf = [0u8; 2];
    spi.read(&mut buf).ok();
    u16::from_be_bytes(buf)
}

// Compiles to the SAME assembly as:
fn read_sensor_stm32(spi: &mut embassy_stm32::i2c::I2c<'_>) -> u16 {
    let mut buf = [0u8; 2];
    spi.read(&mut buf).ok();
    u16::from_be_bytes(buf)
}
}

You can verify this yourself on Compiler Explorer. The generic version and the concrete version produce identical machine code.

Const Generics for Buffer Sizes

Const generics let you parameterize types and functions over constant values. This is perfect for embedded systems where buffer sizes are fixed at compile time but might differ between use cases.

#![allow(unused)]
fn main() {
struct RingBuffer<T, const N: usize> {
    buf: [T; N],
    head: usize,
    tail: usize,
}

impl<T: Copy + Default, const N: usize> RingBuffer<T, N> {
    const fn new() -> Self {
        Self {
            buf: [T::default(); N],
            head: 0,
            tail: 0,
        }
    }

    fn push(&mut self, item: T) -> bool {
        let next = (self.head + 1) % N;
        if next == self.tail {
            return false; // Full
        }
        self.buf[self.head] = item;
        self.head = next;
        true
    }

    fn pop(&mut self) -> Option<T> {
        if self.head == self.tail {
            return None; // Empty
        }
        let item = self.buf[self.tail];
        self.tail = (self.tail + 1) % N;
        Some(item)
    }
}

// Different sizes for different needs — all statically allocated
static mut IMU_BUF: RingBuffer<ImuReading, 128> = RingBuffer::new();
static mut BARO_BUF: RingBuffer<BaroReading, 16> = RingBuffer::new();
}

Think About It: Every pattern in this chapter shares a common theme: move decisions to compile time. Fixed-size collections, exhaustive state matching, unit types, monomorphized generics, const-sized buffers. The more the compiler knows, the fewer bugs can hide until runtime.

Summary

PatternProblem It SolvesRuntime Cost
heapless collectionsHeap fragmentationZero — stack allocated
Explicit error handlingSilent failuresZero — compiler enforced
Enum state machinesForgotten statesZero — compiler enforced
Newtype unitsUnit confusionZero — erased at compile time
embedded-hal traitsPlatform lock-inZero — monomorphized
Const genericsHardcoded buffer sizesZero — compile-time constants

These patterns are not just nice-to-haves. In embedded systems, they are the difference between a prototype that works on your desk and a product that works in the field for years.

Building a Complete Sensor Hub

It is time to put everything together. In this chapter, we build a complete sensor hub — the kind of system you would find at the heart of a drone flight controller, a weather station, or an industrial monitoring unit. Every concept from the previous chapters appears here: SPI, I2C, ADC, DMA, timers, UART, watchdogs, channels, and async tasks.

What We Are Building

Our sensor hub reads from multiple sensors at different rates, fuses the data, and streams telemetry over UART:

ComponentInterfaceRatePurpose
ICM-42688 IMUSPI11000 HzAccelerometer + Gyroscope
BMP280 BarometerI2C150 HzAltitude estimation
Battery voltageADC1 CH010 HzBattery monitoring
Status LEDGPIOVariableSystem health indicator
Telemetry outputUART150 HzData stream to ground station
WatchdogIWDG1 HzReset if firmware hangs

This is a real-world architecture. The IMU runs at 1kHz because attitude estimation needs high-frequency inertial data. The barometer runs at 50Hz because pressure changes slowly. The battery monitor runs at 10Hz because voltage changes even more slowly. Each component runs at the rate it actually needs — no faster, no slower.

Architecture: Embassy Tasks

We structure the system as independent Embassy tasks communicating through channels:

┌─────────────┐     ┌──────────┐
│  imu_task   │────>│          │     ┌──────────────┐
│  (1000 Hz)  │     │          │     │              │
├─────────────┤     │ Channel  │────>│ telemetry    │──── UART TX
│  baro_task  │────>│ (64 deep)│     │ _task (50Hz) │
│  (50 Hz)    │     │          │     │              │
├─────────────┤     │          │     └──────────────┘
│ battery_task│────>│          │
│  (10 Hz)    │     └──────────┘
├─────────────┤
│  led_task   │  (reads system state independently)
│  (variable) │
├─────────────┤
│    main     │  (pets watchdog every 500ms)
└─────────────┘

Each task owns its peripheral. No shared mutable state, no mutexes for the common case. Data flows in one direction: sensors produce readings, the telemetry task consumes them.

Data Types

First, we define the data structures that flow through the system:

#![allow(unused)]
fn main() {
use heapless::String;

#[derive(Clone, Copy, Debug, defmt::Format)]
pub struct ImuReading {
    pub accel: [f32; 3],  // m/s^2, XYZ
    pub gyro: [f32; 3],   // rad/s, XYZ
    pub timestamp_ms: u64,
}

#[derive(Clone, Copy, Debug, defmt::Format)]
pub struct BaroReading {
    pub pressure_pa: f32,
    pub temperature_c: f32,
    pub timestamp_ms: u64,
}

#[derive(Clone, Copy, Debug, defmt::Format)]
pub struct BatteryReading {
    pub voltage: f32,
    pub percentage: u8,
    pub timestamp_ms: u64,
}

#[derive(Clone, Copy, Debug, defmt::Format)]
pub enum SensorData {
    Imu(ImuReading),
    Baro(BaroReading),
    Battery(BatteryReading),
}

#[derive(Clone, Copy, Debug, PartialEq, defmt::Format)]
pub enum SystemState {
    Initializing,
    Running,
    LowBattery,
    SensorFault,
}
}

Think About It: The SensorData enum lets us send different reading types through a single channel. The telemetry task matches on the variant to format each type appropriately. This is much cleaner than having three separate channels.

The Main Entry Point

#![no_std]
#![no_main]

use embassy_executor::Spawner;
use embassy_stm32::{self, Config};
use embassy_stm32::gpio::{Level, Output, Speed};
use embassy_stm32::wdg::IndependentWatchdog;
use embassy_sync::channel::Channel;
use embassy_sync::blocking_mutex::raw::CriticalSectionRawMutex;
use embassy_time::Timer;
use defmt_rtt as _;
use panic_probe as _;

use crate::{SensorData, SystemState};

// Channel: up to 64 readings buffered
static SENSOR_CHANNEL: Channel<CriticalSectionRawMutex, SensorData, 64> =
    Channel::new();

// System state shared via atomic-like signal
static SYSTEM_STATE: embassy_sync::signal::Signal<
    CriticalSectionRawMutex, SystemState
> = embassy_sync::signal::Signal::new();

#[embassy_executor::main]
async fn main(spawner: Spawner) {
    // Configure clocks for H743: 480 MHz system clock
    let mut config = Config::default();
    {
        use embassy_stm32::rcc::*;
        config.rcc.hse = Some(Hse {
            freq: embassy_stm32::time::Hertz(25_000_000),
            mode: HseMode::Oscillator,
        });
        config.rcc.pll1 = Some(Pll {
            source: PllSource::HSE,
            prediv: PllPreDiv::DIV5,
            mul: PllMul::MUL192,
            divp: Some(PllDiv::DIV2),  // 480 MHz
            divq: Some(PllDiv::DIV4),  // 120 MHz for SPI
            divr: None,
        });
        config.rcc.sys = Sysclk::PLL1_P;
    }
    let p = embassy_stm32::init(config);
    defmt::info!("Sensor hub starting");

    // Initialize watchdog — 1 second timeout
    let mut wdg = IndependentWatchdog::new(p.IWDG1, 1_000_000);
    wdg.unleash();

    // Spawn all tasks
    spawner.spawn(imu_task(p.SPI1, p.PA5, p.PA7, p.PA6, p.PA4)).unwrap();
    spawner.spawn(baro_task(p.I2C1, p.PB6, p.PB7)).unwrap();
    spawner.spawn(battery_task(p.ADC1, p.PA0)).unwrap();
    spawner.spawn(telemetry_task(p.USART1, p.PA9, p.PA10)).unwrap();
    spawner.spawn(led_task(p.PE1)).unwrap();

    SYSTEM_STATE.signal(SystemState::Running);
    defmt::info!("All tasks spawned, entering watchdog loop");

    // Main loop: pet the watchdog
    loop {
        wdg.pet();
        Timer::after_millis(500).await;
    }
}

The IMU Task (1kHz SPI)

#![allow(unused)]
fn main() {
#[embassy_executor::task]
async fn imu_task(
    spi_peri: embassy_stm32::peripherals::SPI1,
    sck: embassy_stm32::peripherals::PA5,
    mosi: embassy_stm32::peripherals::PA7,
    miso: embassy_stm32::peripherals::PA6,
    cs_pin: embassy_stm32::peripherals::PA4,
) {
    use embassy_stm32::spi::{self, Spi};
    use embassy_stm32::gpio::{Level, Output, Speed};

    let mut spi_config = spi::Config::default();
    spi_config.frequency = embassy_stm32::time::Hertz(8_000_000);

    let mut spi = Spi::new_blocking(
        spi_peri, sck, mosi, miso, spi_config,
    );
    let mut cs = Output::new(cs_pin, Level::High, Speed::VeryHigh);

    // Initialize ICM-42688
    icm42688_init(&mut spi, &mut cs);
    defmt::info!("IMU initialized");

    let mut ticker = embassy_time::Ticker::every(
        embassy_time::Duration::from_hz(1000)
    );

    loop {
        ticker.next().await;

        match icm42688_read(&mut spi, &mut cs) {
            Ok(reading) => {
                SENSOR_CHANNEL.send(SensorData::Imu(reading)).await;
            }
            Err(e) => {
                defmt::warn!("IMU read error: {:?}", e);
                SYSTEM_STATE.signal(SystemState::SensorFault);
            }
        }
    }
}

fn icm42688_init(
    spi: &mut impl embedded_hal::spi::SpiBus,
    cs: &mut Output<'_>,
) {
    // Write to PWR_MGMT register: enable accel + gyro
    cs.set_low();
    spi.write(&[0x4E, 0x0F]).ok();
    cs.set_high();
}

fn icm42688_read(
    spi: &mut impl embedded_hal::spi::SpiBus,
    cs: &mut Output<'_>,
) -> Result<ImuReading, ()> {
    let mut buf = [0u8; 13]; // 1 addr + 6 accel + 6 gyro
    buf[0] = 0x1F | 0x80;   // ACCEL_DATA_X1, read flag

    cs.set_low();
    spi.transfer_in_place(&mut buf).map_err(|_| ())?;
    cs.set_high();

    let raw_accel = [
        i16::from_be_bytes([buf[1], buf[2]]) as f32 / 2048.0,
        i16::from_be_bytes([buf[3], buf[4]]) as f32 / 2048.0,
        i16::from_be_bytes([buf[5], buf[6]]) as f32 / 2048.0,
    ];
    let raw_gyro = [
        i16::from_be_bytes([buf[7], buf[8]]) as f32 / 16.4,
        i16::from_be_bytes([buf[9], buf[10]]) as f32 / 16.4,
        i16::from_be_bytes([buf[11], buf[12]]) as f32 / 16.4,
    ];

    Ok(ImuReading {
        accel: raw_accel,
        gyro: raw_gyro,
        timestamp_ms: embassy_time::Instant::now().as_millis(),
    })
}
}

The Barometer Task (50Hz I2C)

#![allow(unused)]
fn main() {
#[embassy_executor::task]
async fn baro_task(
    i2c_peri: embassy_stm32::peripherals::I2C1,
    scl: embassy_stm32::peripherals::PB6,
    sda: embassy_stm32::peripherals::PB7,
) {
    use embassy_stm32::i2c::I2c;

    let i2c = I2c::new_blocking(
        i2c_peri, scl, sda,
        embassy_stm32::time::Hertz(400_000),
        Default::default(),
    );
    let mut bmp = Bmp280::new(i2c, 0x76);

    defmt::info!("Barometer initialized");
    let mut ticker = embassy_time::Ticker::every(
        embassy_time::Duration::from_hz(50)
    );

    loop {
        ticker.next().await;

        match bmp.read() {
            Ok(reading) => {
                SENSOR_CHANNEL.send(SensorData::Baro(reading)).await;
            }
            Err(_) => {
                defmt::warn!("Baro read failed");
            }
        }
    }
}
}

The Battery Task (10Hz ADC)

#![allow(unused)]
fn main() {
#[embassy_executor::task]
async fn battery_task(
    adc_peri: embassy_stm32::peripherals::ADC1,
    pin: embassy_stm32::peripherals::PA0,
) {
    use embassy_stm32::adc::Adc;

    let mut adc = Adc::new(adc_peri);
    let mut bat_pin = pin;

    let mut ticker = embassy_time::Ticker::every(
        embassy_time::Duration::from_hz(10)
    );

    loop {
        ticker.next().await;

        let raw: u16 = adc.blocking_read(&mut bat_pin);
        // Voltage divider: 10k/3.3k divides battery voltage
        // ADC reference = 3.3V, 12-bit = 4096 counts
        let adc_voltage = raw as f32 * 3.3 / 4096.0;
        let battery_voltage = adc_voltage * (10.0 + 3.3) / 3.3;

        let percentage = voltage_to_percentage(battery_voltage);

        let reading = BatteryReading {
            voltage: battery_voltage,
            percentage,
            timestamp_ms: embassy_time::Instant::now().as_millis(),
        };

        if percentage < 20 {
            SYSTEM_STATE.signal(SystemState::LowBattery);
        }

        SENSOR_CHANNEL.send(SensorData::Battery(reading)).await;
    }
}

fn voltage_to_percentage(v: f32) -> u8 {
    // Simple linear mapping for a 3S LiPo (9.0V–12.6V)
    let pct = ((v - 9.0) / (12.6 - 9.0) * 100.0) as i32;
    pct.clamp(0, 100) as u8
}
}

The Telemetry Task (50Hz UART)

#![allow(unused)]
fn main() {
#[embassy_executor::task]
async fn telemetry_task(
    uart_peri: embassy_stm32::peripherals::USART1,
    tx_pin: embassy_stm32::peripherals::PA9,
    rx_pin: embassy_stm32::peripherals::PA10,
) {
    use embassy_stm32::usart::{Config as UartConfig, UartTx};

    let mut config = UartConfig::default();
    config.baudrate = 115200;

    let mut tx = UartTx::new_blocking(uart_peri, tx_pin, config).unwrap();

    let mut last_imu = ImuReading { accel: [0.0; 3], gyro: [0.0; 3], timestamp_ms: 0 };
    let mut last_baro = BaroReading { pressure_pa: 0.0, temperature_c: 0.0, timestamp_ms: 0 };
    let mut last_bat = BatteryReading { voltage: 0.0, percentage: 0, timestamp_ms: 0 };

    let mut msg_buf: heapless::String<256> = heapless::String::new();

    loop {
        // Receive the next sensor reading (blocks until one arrives)
        let data = SENSOR_CHANNEL.receive().await;

        match data {
            SensorData::Imu(r) => last_imu = r,
            SensorData::Baro(r) => last_baro = r,
            SensorData::Battery(r) => last_bat = r,
        }

        // Format and send telemetry every time we get a baro reading (~50Hz)
        if matches!(data, SensorData::Baro(_)) {
            msg_buf.clear();
            core::fmt::write(
                &mut msg_buf,
                format_args!(
                    "T:{} A:{:.1},{:.1},{:.1} G:{:.1},{:.1},{:.1} P:{:.0} B:{:.1}V {}%\r\n",
                    last_imu.timestamp_ms,
                    last_imu.accel[0], last_imu.accel[1], last_imu.accel[2],
                    last_imu.gyro[0], last_imu.gyro[1], last_imu.gyro[2],
                    last_baro.pressure_pa,
                    last_bat.voltage, last_bat.percentage,
                ),
            ).ok();

            tx.blocking_write(msg_buf.as_bytes()).ok();
        }
    }
}
}

The Status LED Task

#![allow(unused)]
fn main() {
#[embassy_executor::task]
async fn led_task(pin: embassy_stm32::peripherals::PE1) {
    let mut led = Output::new(pin, Level::Low, Speed::Low);

    loop {
        let state = SYSTEM_STATE.wait().await;

        // Blink pattern indicates system state
        let (on_ms, off_ms) = match state {
            SystemState::Initializing => (100, 900),  // Slow pulse
            SystemState::Running => (50, 1950),        // Brief heartbeat
            SystemState::LowBattery => (200, 200),     // Fast blink
            SystemState::SensorFault => (500, 500),    // Medium blink
        };

        led.set_high();
        Timer::after_millis(on_ms).await;
        led.set_low();
        Timer::after_millis(off_ms).await;
    }
}
}

Fun Fact: Professional flight controllers use LED blink patterns as their primary status indicator. Three short blinks means "GPS lock acquired." Rapid flashing means "low battery." A solid light means "armed and ready." You can encode a surprising amount of information in a single LED.

Porting to STM32F4

The beauty of this architecture is how little changes when you move to a different chip. Here is what you modify to run this on an STM32F411:

What ChangesH743F411
Cargo featurestm32h743vistm32f411ce
Clock configHSE 25MHz, PLL to 480MHzHSE 25MHz, PLL to 100MHz
LED pinPE1PC13
SPI frequency8 MHz (same)8 MHz (same)
Memory sections.sram1_bss for DMANot needed

The task architecture, the channel communication, the sensor reading code, the telemetry formatting — all identical. You change the chip feature, adjust the clock tree, update pin assignments, and remove the H7-specific memory section attributes. The business logic does not change at all.

# H743 version
[dependencies.embassy-stm32]
features = ["stm32h743vi", "time-driver-any", "memory-x"]

# F411 version — just change the feature
[dependencies.embassy-stm32]
features = ["stm32f411ce", "time-driver-any", "memory-x"]

Summary

This sensor hub demonstrates the full Embassy pattern:

  1. One task per concern — each sensor gets its own async task running at its own rate
  2. Channels for communication — data flows from producers to consumers without shared mutable state
  3. Graceful error handling — sensor failures log warnings and signal state changes, they do not crash the system
  4. Watchdog protection — the main loop pets the watchdog; if any task blocks or panics, the system resets
  5. Portable architecture — the same task structure works across STM32 families with minimal changes

This is not a toy example. With real sensor drivers and tuned timing, this code structure runs actual drones, robots, and industrial sensors in production.

From Prototype to Production

Your code works on a dev board. The sensors read correctly, the telemetry streams, the watchdog keeps things alive. Now what? The gap between a working prototype and a reliable product is filled with hardware design, testing discipline, chip selection, and certification. This chapter covers the practical knowledge you need to cross that gap.

Power Design

Every production board needs clean, stable power. Here is the standard power chain:

Battery (7.4V–12.6V)
  │
  ├── Schottky Diode (reverse polarity protection)
  │
  ├── Bulk Capacitor (100µF electrolytic)
  │
  ├── 3.3V LDO Regulator (e.g., AMS1117-3.3)
  │     │
  │     ├── 10µF input cap
  │     ├── 10µF output cap
  │     │
  │     └── 3.3V Rail
  │           │
  │           ├── STM32 (100nF on each VDD pin)
  │           ├── IMU (100nF)
  │           ├── Barometer (100nF)
  │           └── Other ICs (100nF each)
  │
  └── 5V Buck Converter (if needed for servos/motors)

The 100nF rule: Every IC gets a 100nF ceramic decoupling capacitor on every power pin, placed as close to the pin as physically possible — within 5mm. This is not optional. Without decoupling caps, high-frequency current spikes from the digital logic create voltage dips that cause random resets, communication errors, and sensor noise.

ComponentPurposePlacement
Schottky diodeReverse polarity protectionBattery input
100µF electrolyticBulk energy storageAfter diode
10µF ceramicRegulator stabilityRegulator input and output
100nF ceramicHigh-frequency decouplingEvery VDD pin, within 5mm
Ferrite beadIsolate analog/digital powerBetween VDDA and VDD

Fun Fact: The 100nF decoupling capacitor is the single most common component on any digital PCB. A typical STM32H7 board might have 15-20 of them just for the microcontroller alone — one for each VDD, VDDA, VDDLDO, and VREF pin.

PCB Design Tips

PCB layout is where electrical theory meets physical reality. A schematic that works perfectly in simulation can fail on a poorly laid out board.

Ground planes: Use a solid, unbroken ground plane on an inner layer. Do not route signal traces through the ground plane under your MCU. Current returns through the ground plane directly beneath the signal trace — if you cut the ground, the return current has to detour around the cut, creating an antenna.

Signal integrity for SPI: The SPI clock (SCK) is your highest-frequency signal. Keep SPI traces short — under 5cm. Route SCK and MOSI/MISO as a group, keep them away from noisy power traces. Match trace lengths if your SPI clock exceeds 20MHz.

Sensor placement: Put the IMU as far from motors and power electronics as possible. Vibration and electromagnetic interference from motors corrupt accelerometer and gyroscope readings. If you cannot get physical distance, use a separate ground pour under the IMU connected to the main ground at a single point.

Layer stackup for a 4-layer board:

LayerPurpose
TopSignal traces, components
Inner 1Ground plane (unbroken)
Inner 2Power plane (3.3V, 5V)
BottomSignal traces, components

SWD test pads: Always include SWD pads (SWDIO, SWCLK, GND, 3.3V, NRST) on your production board, even if you plan to program via UART bootloader. When something goes wrong in the field, SWD access is invaluable for debugging. Use through-hole pads or tag-connect footprints.

Think About It: A 4-layer PCB costs roughly 30-50% more than a 2-layer board in small quantities. For anything with high-speed SPI, multiple power rails, or noise-sensitive analog inputs, the 4-layer board pays for itself in debugging time you do not spend.

Testing Levels

Professional embedded development uses three testing stages. Skipping any of them is asking for field failures.

Desk Testing

This is what you have been doing throughout this book — the board on your desk, connected to a debugger, exercising each subsystem:

  • Sensor verification: Read WHO_AM_I registers, verify data ranges make sense
  • Communication check: UART telemetry at all supported baud rates, SPI at full speed
  • Watchdog validation: Deliberately trigger a hang (infinite loop), confirm the watchdog resets the system
  • Thermal test: Run the system for 24 hours continuously, monitor for drift or failures
  • Power consumption: Measure idle and active current draw, verify sleep modes work

Integration Testing

Connect the board to the full system — motors, airframe, sensors in their final mounting positions:

  • EMI testing: Do motors cause sensor noise? Does the radio interfere with I2C? Run all subsystems simultaneously at full power
  • Vibration testing: Mount the board as it will be mounted in production. Read IMU data while motors are running. Look for resonance frequencies
  • Failsafe testing: Disconnect each sensor one at a time. Does the system degrade gracefully? Pull the battery briefly — does the watchdog catch the brownout?
  • Temperature range: If your product operates outdoors, test in a freezer and under a heat lamp. STM32 industrial-grade chips are rated for -40C to +105C

Field Testing

The system in its real environment:

  • Tethered first: For a drone, fly tethered to the ground before free flight. For a vehicle, test on a dynamometer before road testing
  • Log everything: Record all sensor data, system states, and error counts. Review logs after each test session
  • Endurance testing: Run the system for 10x the expected mission duration. If your drone flight is 15 minutes, test for 2.5 hours
  • Edge cases: Test at the boundaries — maximum altitude, minimum temperature, maximum speed, lowest battery voltage
#![allow(unused)]
fn main() {
// A simple flight log structure for field testing
#[derive(defmt::Format)]
struct FlightLog {
    timestamp_ms: u64,
    mode: FlightMode,
    battery_v: f32,
    error_count: u32,
    imu_valid: bool,
    baro_valid: bool,
}
}

STM32 Selection for Production

Choosing the right STM32 for a product is different from choosing one for learning. Here are the key factors:

FactorConsideration
LongevityWill ST manufacture this chip for 10+ years?
AvailabilityCan you actually buy 1000 units today?
PackageQFP for hand soldering, BGA for density
Temperature rangeCommercial (0-70C) vs Industrial (-40 to +105C)
Cost at volumePer-unit price at 1K, 10K, 100K quantities
EcosystemHAL maturity, Embassy support, community knowledge

STM32G0 — The go-to for cost-sensitive, low-complexity products. Widely available, extremely affordable, good Embassy support. Perfect for sensor nodes, simple controllers, LED drivers.

STM32F4 — The workhorse. The F411 and F407 have been in production for over a decade, are available from every distributor, and have the largest community. If your product needs an FPU and moderate performance, the F4 is a safe bet.

STM32H5 — ST's newest mainstream high-performance family. Designed as the long-term successor to the F4/F7. Better power efficiency, hardware security, and guaranteed long production life.

STM32U5 — The ultra-low-power choice. If your product runs on a coin cell or needs months of battery life, the U5 family is built for it.

Fun Fact: The STM32F103 — the "Blue Pill" chip — has been in continuous production since 2007. That is nearly two decades. ST commits to a minimum 10-year production lifecycle for most STM32 families, and many go well beyond that.

Cost Tiers (Approximate, 1K Quantity)

TierFamiliesApproximate INRUse Case
BudgetF0, G0, L050-100Simple sensors, LED control
MainstreamF1, F4, G4150-350Motor control, data logging
High PerformanceF7, H7, H5500-1200Signal processing, multi-sensor fusion
Ultra Low PowerL4, U5200-500Battery-powered, wearables

Certification Considerations

If you sell a product, it needs to pass regulatory certification. The requirements depend on your market:

CE marking (Europe) and FCC (USA) — Required for any electronic device. Covers electromagnetic emissions (your board does not interfere with other devices) and immunity (other devices do not break yours). Good PCB design with proper ground planes and decoupling gets you most of the way there.

DO-178C (Aviation) — If your embedded system goes in an aircraft, the software must be developed under this standard. Rust's type safety and memory safety are increasingly recognized as beneficial, but the tooling certification story is still evolving.

IEC 61508 (Industrial Safety) — For safety-instrumented systems in industrial environments. Requires formal hazard analysis and systematic software development practices.

The good news: Rust's compile-time guarantees — no null pointer dereferences, no buffer overflows, no data races — give you a head start on the safety arguments for any of these standards.

Supply Chain

The semiconductor shortage of 2021-2023 taught the industry painful lessons. For production:

  • Second source: If possible, design your board to accept two pin-compatible STM32 variants. The F411CE and F401CE share the same pinout
  • Buy ahead: Once your design is finalized, buy 6-12 months of inventory for critical components
  • Distributor relationships: Work with authorized distributors (Mouser, DigiKey, Farnell). Avoid the gray market for production components
  • Design for availability: Use common packages (LQFP-64, LQFP-48) rather than exotic ones. Common parts are restocked first during shortages

Summary

Going from prototype to production is a discipline, not a single step. Get the power design right, lay out the PCB carefully, test at every level, choose chips with long production commitments, and plan for supply chain disruptions. The firmware patterns from the rest of this book give you reliable software — this chapter gives you the context to put it on reliable hardware.

Reference Patterns

This chapter is a quick-reference you will come back to again and again. It collects the target triples, feature flags, pin mappings, and debugging checklists that save you from searching through documentation at 2am.

Target Triples

Every ARM Cortex-M core has a specific Rust compilation target. Using the wrong one means your binary will not run — or worse, will run with subtle bugs.

Target TripleCoreFPUUsed By
thumbv6m-none-eabiCortex-M0, M0+NoF0, G0, L0
thumbv7m-none-eabiCortex-M3NoF1, F2, L1
thumbv7em-none-eabiCortex-M4 (no FPU)NoL4 (some), G4 (if FPU unused)
thumbv7em-none-eabihfCortex-M4, M7YesF3, F4, F7, H7, G4, L4+
thumbv8m.main-none-eabihfCortex-M33YesH5, U5, L5

The hf suffix means "hardware float." If your chip has an FPU (most F4, F7, H7, G4 do), always use the hf target. Using the non-hf target on an FPU-equipped chip works but wastes the hardware floating point unit — all float math gets done in software, roughly 10-50x slower.

Think About It: The target triple encodes the instruction set, not the chip model. An STM32F407 and an STM32F411 use the same target (thumbv7em-none-eabihf) because they both have Cortex-M4F cores. An STM32G071 uses a different target (thumbv6m-none-eabi) because it has a Cortex-M0+ core.

Embassy Chip Feature Flags

Embassy uses Cargo feature flags to select the exact chip model. The feature name matches the chip part number in lowercase. Here are the most common ones:

Board / ChipEmbassy FeatureTargetNotes
Blue Pill (F103C8)stm32f103c8thumbv7m-none-eabiClassic learning board
Black Pill (F411CE)stm32f411cethumbv7em-none-eabihfGreat value, USB OTG
STM32F407 Discoverystm32f407vgthumbv7em-none-eabihf4 LEDs, audio codec
Nucleo-G071RBstm32g071rbthumbv6m-none-eabiBudget, lots of peripherals
Nucleo-H743ZIstm32h743vithumbv7em-none-eabihf480MHz, complex memory
Nucleo-U575ZIstm32u575zithumbv8m.main-none-eabihfUltra-low-power
Nucleo-H563ZIstm32h563zithumbv8m.main-none-eabihfH5 mainstream
WeAct STM32G431stm32g431cbthumbv7em-none-eabihfMotor control, USB-C
# In Cargo.toml — change only this feature to switch chips
[dependencies.embassy-stm32]
version = "0.2"
features = ["stm32f411ce", "time-driver-any", "memory-x"]

Common LED Pins by Board

When you are trying to blink an LED and nothing happens, the first question is: which pin is the LED on?

BoardLED PinActiveColor
Blue Pill (F103)PC13LowGreen
Black Pill (F411)PC13LowBlue
STM32F407 DiscoveryPD12, PD13, PD14, PD15HighGreen, Orange, Red, Blue
Nucleo-64 (most)PA5HighGreen
Nucleo-144 (most)PB0, PB7, PB14HighGreen, Blue, Red
WeAct H743PE3LowBlue

Fun Fact: The "active low" LED convention (PC13 on Blue/Black Pill) means you set the pin LOW to turn the LED ON. This is because the LED is wired between VCC and the GPIO pin. When the pin is low, current flows through the LED. When the pin is high, both sides are at the same voltage and no current flows.

Debug Checklist

When something does not work, walk through this checklist in order. Each item is listed roughly by how often it causes problems:

Power and Hardware

  • Power connected? Check 3.3V rail with a multimeter. Should be 3.2V–3.4V.
  • Debugger connected? ST-Link or J-Link wired to SWDIO, SWCLK, GND. Is the debugger LED solid?
  • Correct chip selected? The Embassy feature flag must match your physical chip exactly.
  • Reset pin? Some boards need NRST connected or have a reset button you need to press.

Basic Firmware

  • LED blinks? Before debugging any peripheral, confirm the chip is running code at all.
  • defmt logs appearing? If not, check that defmt-rtt is in your dependencies and probe-rs is configured correctly.
  • Clock configuration correct? Wrong PLL settings can make the chip run at the wrong speed or not run at all. Start with the default clock (HSI) to confirm basic operation.

GPIO and Pins

  • Correct pin for this function? Check the datasheet's alternate function mapping.
  • Correct alternate function number? PA9 might be USART1_TX on AF7 but SPI1_SCK on AF5. Getting the AF wrong means the peripheral is not connected to the pin.
  • Pin mode correct? Push-pull vs open-drain, pull-up vs pull-down.

Communication Peripherals

  • SPI: CS pin toggling? Scope it. If CS is stuck high, the slave ignores everything.
  • SPI: Clock polarity and phase? CPOL and CPHA must match the slave device's requirements.
  • I2C: Pull-up resistors present? I2C requires external 4.7k pull-ups to 3.3V on SDA and SCL. Without them, the bus stays low.
  • I2C: Correct address? Some datasheets give 8-bit addresses, Rust expects 7-bit. Divide by 2 if needed.
  • UART: TX and RX swapped? The most common UART mistake. Your TX connects to their RX and vice versa.
  • UART: Baud rate match? Both sides must agree on the baud rate exactly.

STM32H7 Specific

  • DMA buffer in correct memory region? Must be in SRAM1/2/3, NOT in DTCM or AXI SRAM.
  • Cache coherency? If using AXI SRAM with DMA, cache invalidation or clean operations are required.
  • Power supply configuration? H7 has SMPS and LDO options. Wrong power configuration prevents boot.

System Level

  • Watchdog timeout too short? If the watchdog fires during initialization, the chip resets in a loop.
  • Stack overflow? Large local arrays can overflow the stack. Use static for big buffers. Enable the paint feature in cortex-m-rt to detect stack overflows.
  • Flash full? cargo size shows your binary size. If it exceeds your chip's Flash, linking fails with an obscure error.

Common Error Messages and Fixes

Error MessageLikely CauseFix
Error: connecting to the chip was unsuccessfulDebugger wiring, chip not powered, wrong chip in probe-rs configCheck wiring, power, and probe-rs chip setting
SP is not 8-byte alignedCorrupt vector table or wrong memory.xVerify FLASH and RAM origins in memory.x
292 or HardFaultNull pointer, stack overflow, or invalid memory accessEnable defmt panic handler, check stack size, review unsafe blocks
292 292 292 292 292 292 (defmt gibberish)defmt version mismatch between crate and probe-rsRun cargo update and update probe-rs
292 292 then silencePanic during initializationSimplify init code, add defmt::info at each step to find where it stops
error[E0292]: stm32f411ce not foundWrong feature flag or embassy versionCheck embassy-stm32 version supports your chip
292292 292292 292292 292292 (repeating)Watchdog reset loopIncrease watchdog timeout or pet it earlier in init
292292292292...292292292 (continuous)Boot loop from brownoutCheck power supply voltage and decoupling capacitors

Think About It: Most embedded debugging follows a pattern: confirm the obvious first (power, wiring, chip selection), then narrow down to the specific subsystem. Resist the urge to change code before confirming the hardware is correct.

Quick Pin Reference Template

When starting a new project, fill in this table for your specific board:

Project: ____________________
Board:   ____________________
Chip:    ____________________

| Function      | Pin  | AF# | Notes           |
|---------------|------|-----|-----------------|
| SPI1_SCK      |      |     |                 |
| SPI1_MOSI     |      |     |                 |
| SPI1_MISO     |      |     |                 |
| SPI1_CS       |      |     | Manual GPIO     |
| I2C1_SCL      |      |     | 4.7k pull-up    |
| I2C1_SDA      |      |     | 4.7k pull-up    |
| USART1_TX     |      |     |                 |
| USART1_RX     |      |     |                 |
| ADC1_CH0      |      |     |                 |
| LED           |      |     | Active High/Low |
| SWD_IO        |      |     |                 |
| SWD_CLK       |      |     |                 |

Debugging with probe-rs

Quick command reference for the tools you will use most often:

# Flash and run with defmt output
cargo run --release

# Just flash without running
probe-rs download target/thumbv7em-none-eabihf/release/my-project --chip STM32F411CE

# Reset the chip
probe-rs reset --chip STM32F411CE

# List connected probes
probe-rs list

# Check what chip is connected
probe-rs info

Summary

Keep this chapter bookmarked. The target triples table, the debug checklist, and the error message reference will save you hours over the course of any embedded Rust project. When something breaks — and it will — start at the top of the checklist and work down methodically.

Appendix A: STM32 Series Comparison

This table covers every major STM32 family as of 2025. Use it to pick the right chip for your project. Prices are approximate for single-unit quantities in Indian Rupees (INR) and will vary by distributor and availability.

The Big Table

SeriesCoreMax MHzRAMFlashFPU~INRBest For
F0Cortex-M0484–32KB16–256KBNo60–120Cheapest STM32, simple control
G0Cortex-M0+648–144KB16–512KBNo50–150Modern budget replacement for F0
L0Cortex-M0+322–20KB16–192KBNo70–150Ultra-low-power basic
F1Cortex-M3726–96KB16–1024KBNo80–200Legacy workhorse (Blue Pill)
F3Cortex-M4F7216–80KB64–512KBYes150–300Mixed-signal, motor control
G4Cortex-M4F17032–128KB32–512KBYes150–400Modern F3 replacement, fast ADC
F4Cortex-M4F18064–384KB256KB–2MBYes150–500The mainstream workhorse
L4Cortex-M4F80–12064–320KB256KB–2MBYes200–500Low-power with FPU
F7Cortex-M7F216256–512KB512KB–2MBYes400–900High-performance, L1 cache
H7Cortex-M7F480564KB–1MB+1–2MBYes500–1500Maximum performance
H5Cortex-M33250256–640KB512KB–2MBYes300–700Modern successor to F4/F7
U5Cortex-M33160256–768KB512KB–4MBYes300–600Ultra-low-power high-performance
WBCortex-M4F + M0+6496–256KB256KB–1MBYes300–600Bluetooth 5, 802.15.4, Thread
WLCortex-M4F + M0+4864KB256KBYes250–450LoRa/Sub-GHz radio

How to Read This Table

Core — Higher core number generally means more capable. M0/M0+ is the simplest (no division instruction, no advanced DSP). M4F adds hardware FPU and DSP instructions. M7F adds double-precision FPU and instruction/data caches. M33 (ARMv8-M) adds TrustZone security.

FPU — "Yes" means hardware floating-point. If your application uses f32 math (sensor fusion, PID controllers, signal processing), you want an FPU. Without one, every floating-point operation is emulated in software — 10 to 50 times slower.

RAM — Ranges shown are across the entire family. Check the specific part number for exact amounts. The H7's RAM is split across multiple regions (see Chapter 15).

Best For — A quick recommendation, not a rule. You can run a simple LED blink on an H7 — it just costs more.

Fun Fact: The STM32 family has over 1,200 individual part numbers across all series. ST claims it is the broadest 32-bit MCU portfolio in the industry.

Quick Decision Guide

  • Learning embedded Rust? Start with F4 (Black Pill) or G0 (Nucleo). Best documentation and community support.
  • Battery-powered product? L4 or U5. Designed from the ground up for microamps of sleep current.
  • Need wireless? WB for Bluetooth/Zigbee/Thread. WL for LoRa/Sub-GHz.
  • Maximum compute? H7 at 480MHz with hardware double-precision float.
  • Cost-sensitive volume production? G0 or F0. Under 100 INR per unit.
  • Long-term product design? H5 or U5. ST's newest families with the longest guaranteed production windows.
  • Motor control? G4. Built-in high-resolution timers and fast ADCs designed specifically for motor control.

Appendix B: Pin Reference — STM32H743VIT6

This appendix lists the most commonly used peripheral pin mappings for the STM32H743VIT6 in the LQFP-100 package. All alternate function (AF) numbers are from the official datasheet (DS12110). When configuring Embassy, you specify the pin and Embassy selects the correct AF automatically — but knowing the AF number helps when reading the datasheet or debugging.

SPI1

FunctionPin Option 1AFPin Option 2AFNotes
SCKPA5AF5PB3AF5Clock
MOSIPA7AF5PD7AF5Master Out Slave In
MISOPA6AF5PB4AF5Master In Slave Out
NSSPA4AF5PA15AF5Chip select (usually manual GPIO)

SPI2

FunctionPin Option 1AFPin Option 2AFNotes
SCKPB10AF5PB13AF5Clock
MOSIPB15AF5PC3AF5Master Out Slave In
MISOPB14AF5PC2AF5Master In Slave Out
NSSPB9AF5PB12AF5Chip select

I2C1

FunctionPin Option 1AFPin Option 2AFNotes
SCLPB6AF4PB8AF4Clock — needs 4.7k pull-up to 3.3V
SDAPB7AF4PB9AF4Data — needs 4.7k pull-up to 3.3V

I2C2

FunctionPin Option 1AFPin Option 2AFNotes
SCLPB10AF4PH4AF4Clock
SDAPB11AF4PH5AF4Data

Think About It: PB10 appears in both the SPI2 and I2C2 tables. A pin can only serve one alternate function at a time. If you need both SPI2 and I2C2, pick non-conflicting pin options.

USART1

FunctionPin Option 1AFPin Option 2AFNotes
TXPA9AF7PB6AF7Transmit
RXPA10AF7PB7AF7Receive
CTSPA11AF7Clear to send (flow control)
RTSPA12AF7Request to send (flow control)

USART2

FunctionPin Option 1AFPin Option 2AFNotes
TXPA2AF7PD5AF7Transmit
RXPA3AF7PD6AF7Receive

Fun Fact: On Nucleo-144 boards, USART3 (PD8/PD9) is typically connected to the ST-Link's virtual COM port. This is the UART you see when you open a serial terminal over USB, even though the STM32 itself is using USART3 — not USB.

TIM1 (Advanced Timer — PWM Capable)

FunctionPin Option 1AFPin Option 2AFNotes
CH1PA8AF1PE9AF1PWM output 1
CH2PA9AF1PE11AF1PWM output 2
CH3PA10AF1PE13AF1PWM output 3
CH4PA11AF1PE14AF1PWM output 4
CH1NPA7AF1PB13AF1Complementary output 1
CH2NPB0AF1PB14AF1Complementary output 2
CH3NPB1AF1PB15AF1Complementary output 3

ADC1 Channels

ChannelPinNotes
IN0PA0
IN1PA1
IN2PA2Shared with USART2_TX
IN3PA3Shared with USART2_RX
IN4PA4Shared with SPI1_NSS
IN5PA5Shared with SPI1_SCK
IN6PA6Shared with SPI1_MISO
IN7PA7Shared with SPI1_MOSI
IN8PB0
IN9PB1
IN10PC0
IN15PA0_CDirect ADC channel (H7 specific)
IN16Internal temperature sensor
IN17Internal VBAT/4
IN18Internal VREFINT

USB OTG FS

FunctionPinAFNotes
DMPA11AF10USB Data Minus
DPPA12AF10USB Data Plus
IDPA10AF10OTG ID (host/device detection)

Pin Conflict Quick Check

Before finalizing your pin assignments, watch for these common conflicts on the H743:

ConflictPinsResolution
SPI1 MOSI vs ADC1_IN7PA7Use one or the other, not both simultaneously
USART1 TX vs I2C1 SCLPB6Pick alternate pin option for one peripheral
TIM1_CH2 vs USART1_TXPA9Cannot use both on PA9 — use PE11 for TIM1_CH2
USB DP vs TIM1_CH4PA11/PA12If using USB, route TIM1 to port E

When in doubt, open STM32CubeMX, select your chip, and enable your peripherals. CubeMX will highlight pin conflicts in red.

Appendix C: Cargo.toml Template

This appendix provides a complete, working Cargo.toml and .cargo/config.toml for an Embassy project targeting the STM32H743. Copy it as your starting point, then adjust the chip feature flag for your board.

Cargo.toml

[package]
name = "sensor-hub"
version = "0.1.0"
edition = "2021"

[dependencies]
# Embassy runtime
embassy-executor = { version = "0.7", features = ["arch-cortex-m", "executor-thread", "integrated-timers"] }
embassy-stm32 = { version = "0.2", features = ["stm32h743vi", "time-driver-any", "memory-x"] }
embassy-time = { version = "0.4", features = ["tick-hz-32_768"] }
embassy-sync = "0.6"

# Logging with defmt
defmt = "0.3"
defmt-rtt = "0.4"

# Cortex-M support
cortex-m = { version = "0.7", features = ["critical-section-single-core"] }
cortex-m-rt = "0.7"

# Panic handler — logs panic message via defmt, then halts
panic-probe = { version = "0.3", features = ["print-defmt"] }

# Fixed-size data structures (no heap)
heapless = "0.8"

# Embedded HAL traits
embedded-hal = "1.0"
embedded-hal-async = "1.0"

# Optional: static_cell for safe static initialization
static_cell = "2"

[profile.release]
opt-level = "s"       # Optimize for size
debug = 2             # Keep debug info for defmt
lto = "fat"           # Link-time optimization
codegen-units = 1     # Better optimization, slower compile

.cargo/config.toml

[target.thumbv7em-none-eabihf]
runner = "probe-rs run --chip STM32H743VITx"

[build]
target = "thumbv7em-none-eabihf"

[env]
DEFMT_LOG = "debug"

Switching to a Different Chip

To target a different STM32, change three things:

STM32F411CE (Black Pill)

# In Cargo.toml — change the embassy-stm32 feature
embassy-stm32 = { version = "0.2", features = ["stm32f411ce", "time-driver-any", "memory-x"] }
# In .cargo/config.toml — same target (both are Cortex-M4F)
[target.thumbv7em-none-eabihf]
runner = "probe-rs run --chip STM32F411CEUx"

STM32G071RB (Nucleo — Cortex-M0+)

# In Cargo.toml
embassy-stm32 = { version = "0.2", features = ["stm32g071rb", "time-driver-any", "memory-x"] }

# Also change cortex-m critical section feature if needed
cortex-m = { version = "0.7", features = ["critical-section-single-core"] }
# In .cargo/config.toml — different target for M0+
[target.thumbv6m-none-eabi]
runner = "probe-rs run --chip STM32G071RBTx"

[build]
target = "thumbv6m-none-eabi"

STM32U575ZI (Nucleo — Cortex-M33)

# In Cargo.toml
embassy-stm32 = { version = "0.2", features = ["stm32u575zi", "time-driver-any", "memory-x"] }
# In .cargo/config.toml — ARMv8-M target
[target.thumbv8m.main-none-eabihf]
runner = "probe-rs run --chip STM32U575ZITxQ"

[build]
target = "thumbv8m.main-none-eabihf"

Quick Reference: What to Change

ItemWhereExample
Chip modelCargo.toml embassy-stm32 featuresstm32f411ce
Build target.cargo/config.toml [build]thumbv7em-none-eabihf
probe-rs chip.cargo/config.toml runnerSTM32F411CEUx

Fun Fact: The probe-rs chip name uses the full ST part number suffix (like STM32F411CEUx), where U is the package type (UFQFPN) and x is a wildcard for temperature range. You can find the exact string by running probe-rs chip list | grep -i f411.

Minimal main.rs to Verify Setup

After creating these files, use this minimal main.rs to confirm everything compiles and runs:

#![no_std]
#![no_main]

use embassy_executor::Spawner;
use embassy_time::Timer;
use defmt::*;
use defmt_rtt as _;
use panic_probe as _;

#[embassy_executor::main]
async fn main(_spawner: Spawner) {
    let _p = embassy_stm32::init(Default::default());
    info!("Hello from Embassy!");

    loop {
        info!("tick");
        Timer::after_secs(1).await;
    }
}

If you see "Hello from Embassy!" and "tick" in your probe-rs output, your toolchain is working correctly.

Appendix D: Learning Path

This is a week-by-week study plan for working through this book. It assumes you can dedicate around 5-8 hours per week — some reading, some coding, some staring at a blinking LED wondering why it stopped blinking.

The Plan

WeeksChaptersFocusMilestone
1–21–4Setup and first blinkLED blinks, defmt logs appear in terminal
3–45–6GPIO and clocksButton toggles LED, clock configured for full speed
5–67–8Timers and interruptsPWM-controlled LED brightness, button interrupt
7–89–10UART and SPISerial terminal communication, SPI sensor reading
9–1011–14I2C, ADC, DMA, WatchdogI2C sensor working, ADC reads voltage, DMA transfer, watchdog active
11–1215–17Memory, patterns, sensor hubComplete multi-sensor system running
13+18–19Production and referenceUnderstand production considerations, have a reference to return to

Week-by-Week Details

Weeks 1–2: Getting Started (Chapters 1–4)

Goal: Get a working development environment and blink an LED.

This is where most people get stuck — not because the concepts are hard, but because toolchain setup has many small steps that must all be correct. Be patient. Once the LED blinks, you have crossed the hardest part.

  • Install Rust, probe-rs, and your IDE
  • Wire up your dev board and debugger
  • Create the project from the Cargo.toml template (Appendix C)
  • Flash and run, see defmt output
  • Blink an LED — celebrate, you have earned it

Weeks 3–4: GPIO and Clocks (Chapters 5–6)

Goal: Read a button, control an LED, understand the clock tree.

  • Read a physical button with debouncing
  • Configure the PLL to run the chip at full speed
  • Understand HSI vs HSE, PLL multiplication and division
  • Measure the actual clock frequency with a scope or timer

Weeks 5–6: Timers and Interrupts (Chapters 7–8)

Goal: Generate PWM, respond to hardware events asynchronously.

  • Fade an LED up and down with PWM
  • Set up a periodic timer interrupt
  • Understand Embassy's async model as an alternative to raw interrupts
  • Use a Ticker for precise periodic tasks

Weeks 7–8: Communication — UART and SPI (Chapters 9–10)

Goal: Talk to a computer and talk to a sensor.

  • Send and receive data over UART at 115200 baud
  • Read a SPI sensor (IMU or Flash memory)
  • Understand clock polarity, phase, and chip select
  • Use DMA for UART transmission

Weeks 9–10: More Peripherals (Chapters 11–14)

Goal: Use I2C, ADC, DMA, and the watchdog.

This is a dense two weeks. Four chapters with four different peripherals. Take it one at a time:

  • Week 9: I2C sensor (barometer or temperature) + ADC reading a potentiometer or battery voltage
  • Week 10: DMA for efficient data transfers + watchdog timer for system reliability

Weeks 11–12: Putting It All Together (Chapters 15–17)

Goal: Build the complete sensor hub from Chapter 17.

  • Understand memory architecture (especially if using H7)
  • Learn the embedded Rust patterns from Chapter 16
  • Combine everything into the multi-task sensor hub
  • Test each subsystem individually, then run everything together

Week 13 and Beyond (Chapters 18–19)

Goal: Think about production and keep the reference handy.

  • Read Chapter 18 to understand PCB design, testing, and chip selection
  • Bookmark Chapter 19 as your debugging reference
  • Start designing your own project

Think About It: The most important thing is to keep going when something does not work. Every embedded developer has spent an evening debugging a wiring mistake or a wrong clock configuration. The debug checklist in Chapter 19 exists because we have all been there.

Tips for Self-Study

  • Build every example. Reading code is not the same as running code. Type it out, flash it, and observe the result.
  • Break things on purpose. Set the wrong baud rate and see what happens. Swap SPI clock polarity. Remove the I2C pull-up resistors. Understanding failure modes is as important as understanding success.
  • Keep a lab notebook. Write down what you tried, what worked, and what did not. When you hit a similar problem six months later, your notes are invaluable.
  • Join the community. The Embassy Matrix chat and the Rust Embedded working group are welcoming to beginners. No question is too basic.

Appendix E: Resources

A curated list of references you will come back to again and again.

Official Documentation

ResourceDescription
RM0433 — H743 Reference ManualThe 3,300-page bible for H7. Every register, every peripheral. Download from st.com
RM0090 — F4 Reference ManualThe classic F4 reference. Simpler memory model, excellent for learning
DS12110 — STM32H743 DatasheetPin assignments, electrical characteristics, package drawings
PM0253 — Cortex-M7 Programming ManualARM core details, instruction set, NVIC, SysTick, MPU

Rust Embedded Ecosystem

ResourceURLDescription
Embassyembassy.devOfficial docs, examples, and API reference for the Embassy async framework
Embassy GitHubgithub.com/embassy-rs/embassySource code and extensive examples for every STM32 family
The Embedded Rust Bookdocs.rust-embedded.org/bookThe official guide to bare-metal Rust — start here if Embassy feels too magical
The Discovery Bookdocs.rust-embedded.org/discoveryHands-on tutorial with an STM32F3 Discovery board
embedded-haldocs.rs/embedded-halThe trait definitions that make portable drivers possible
probe-rsprobe.rsThe debugger/flasher toolchain. Replaces OpenOCD for Rust workflows
defmtdefmt.ferrous-systems.comEfficient logging framework designed for microcontrollers
heaplessdocs.rs/heaplessFixed-size collections for no-alloc environments

Tools

ToolPurpose
STM32CubeMXST's graphical pin/clock configurator. Useful for verifying pin assignments even if you never generate C code
Compiler Explorer (godbolt.org)See what assembly your Rust code compiles to. Invaluable for understanding zero-cost abstractions
cargo-binutilscargo size, cargo objdump, cargo nm — inspect your binary's size and contents
cargo-flashFlash firmware without a full debug session
Serial MonitorAny serial terminal (minicom, screen, PuTTY, or the VS Code serial monitor extension) for UART debugging

Community

ChannelDescription
Embassy Matrix Chatmatrix.to/#/#embassy-rs:matrix.org — the most active place for Embassy questions
Rust Embedded Matrixmatrix.to/#/#rust-embedded:matrix.org — broader embedded Rust discussion
r/rustReddit community, search for "embedded" or "stm32"
STM32 ForumsST's official community forums — useful for hardware-specific questions
  1. Start with this book (chapters 1–8) and the Embassy examples
  2. When something feels like magic, read the corresponding section of The Embedded Rust Book
  3. When a peripheral misbehaves, open the Reference Manual for that peripheral's chapter
  4. When optimizing, use Compiler Explorer and cargo-binutils to understand what the compiler produces