Chapter 2: What Is a Microcontroller?
You are reading this on a computer that has a CPU, gigabytes of RAM, a hard drive, a GPU, a network card, and an operating system managing it all. A microcontroller is all of that squeezed onto a single chip the size of your fingernail — minus the luxury.
A Complete Computer on a Chip
A microcontroller (MCU) is a self-contained computer on a single integrated circuit. It has:
- A CPU to execute instructions
- Flash memory to store your program (survives power loss)
- RAM to store variables while running
- Input/Output peripherals to interact with the physical world
All of this costs between 50 and 500 INR, draws milliwatts of power, and fits in a package smaller than a postage stamp.
Fun Fact: There are more microcontrollers in your house than people on your street. Your washing machine, microwave, remote control, thermostat, elevator, car key fob, and electric toothbrush all contain at least one. A modern car has 50-100 of them.
How a Microcontroller Program Works
Every microcontroller program follows the same structure:
Power on
→ Initialize hardware (configure clocks, pins, peripherals)
→ Loop forever:
1. Read inputs (buttons, sensors, communication)
2. Process (make decisions, calculate)
3. Write outputs (LEDs, motors, communication)
There is no operating system. No bootloader menu. No login screen. The moment power is applied, the chip starts executing your program from the first instruction in Flash memory. It runs until power is removed or the universe ends, whichever comes first.
Your program lives in Flash (non-volatile — it persists without power). Your variables live in RAM (volatile — they vanish the instant power is cut). This is why your microcontroller remembers its program after unplugging but starts with fresh variable values every boot.
In Embassy, this looks like:
#[embassy_executor::main] async fn main(_spawner: Spawner) { // 1. Initialize hardware let p = embassy_stm32::init(Default::default()); let mut led = Output::new(p.PC13, Level::High, Speed::Low); let button = Input::new(p.PA0, Pull::Up); // 2. Loop forever loop { // Read input if button.is_low() { // Process + Write output led.set_low(); // LED on (active low) } else { led.set_high(); // LED off } Timer::after_millis(10).await; } }
What Is Inside an STM32
Let us open up the block diagram and look at the major components.
CPU Core: ARM Cortex-M
Every STM32 uses an ARM Cortex-M processor core. ARM does not make chips — they design the processor core and license it to manufacturers like STMicroelectronics. This is why STM32, nRF52, RP2040, and dozens of other chips all share the same instruction set.
Clock speeds range from 48 MHz on entry-level parts to 480 MHz on the H7 series. For context, the original IBM PC ran at 4.77 MHz. Even a "slow" microcontroller is an order of magnitude faster.
Flash Memory (64 KB - 2 MB)
This is where your compiled program lives. Flash is non-volatile (survives power cycles) but has limited write endurance — typically 10,000 write/erase cycles. You write to it when flashing firmware, not during normal operation.
RAM (8 KB - 1 MB)
This is your working memory for variables, buffers, and the stack. It is fast and has unlimited write cycles, but it loses everything when power is removed. Sounds small? A well-written embedded program can do remarkable things in 20 KB of RAM.
Peripherals
This is where it gets interesting. Peripherals are hardware blocks built into the chip that handle specific tasks:
| Peripheral | What It Does |
|---|---|
| GPIO | General Purpose I/O — read buttons, drive LEDs, toggle any pin |
| UART | Serial communication — talk to GPS modules, Bluetooth, PC terminal |
| SPI | Fast synchronous bus — displays, SD cards, flash memory |
| I2C | Two-wire bus — sensors, EEPROMs, many breakout boards |
| Timers | Count time, generate PWM, capture pulse widths |
| ADC | Analog-to-Digital — read voltages from sensors |
| DMA | Direct Memory Access — move data without CPU involvement |
| USB | USB device/host — your chip can be a USB keyboard or mass storage |
| CAN | Controller Area Network — automotive/industrial communication |
Each peripheral operates independently of the CPU. While your code is processing sensor data, the UART can be receiving bytes, the DMA can be moving ADC samples into RAM, and a timer can be generating PWM — all simultaneously, all in hardware.
The Bus System
Peripherals connect to the CPU through buses — digital highways for data:
- AHB (Advanced High-performance Bus) — fast bus for GPIO, DMA, memory
- APB1 (Advanced Peripheral Bus 1) — slower bus for UART, I2C, basic timers
- APB2 — slightly faster peripheral bus for SPI1, ADC, advanced timers
This matters because each bus has a maximum clock speed. On the STM32F411, APB1 maxes out at 50 MHz while APB2 can run at 100 MHz. Peripherals on APB2 can operate faster.
The Clock System
The clock system is the heartbeat of the microcontroller. It determines how fast everything runs. A typical STM32 clock tree:
8 MHz HSE crystal
→ PLL (Phase-Locked Loop) multiplies to 96 MHz
→ SYSCLK = 96 MHz (CPU runs at this speed)
→ AHB = 96 MHz (GPIO, DMA)
→ APB1 = 48 MHz (UART, I2C, basic timers)
→ APB2 = 96 MHz (SPI, ADC, advanced timers)
Embassy configures all of this for you, but understanding it helps when debugging timing issues.
The Cortex-M Family
Not all ARM cores are equal. Here is what you will encounter across the STM32 lineup:
| Core | Features | STM32 Families | Typical Clock |
|---|---|---|---|
| Cortex-M0 / M0+ | Basic, low power, no FPU | F0, G0, L0 | 48-64 MHz |
| Cortex-M3 | Bit-banding, no FPU | F1 | 72 MHz |
| Cortex-M4F | DSP instructions, single-precision FPU | F3, F4, L4, G4, WB | 72-170 MHz |
| Cortex-M7 | Double-precision FPU, I/D cache, branch prediction | F7, H7 | 216-480 MHz |
| Cortex-M33 | TrustZone security, FPU, DSP | L5, U5, H5 | 110-250 MHz |
Why the FPU Matters
FPU stands for Floating Point Unit — dedicated hardware for decimal math. Without an FPU, calculating 3.14 * 2.0 requires dozens of integer instructions emulating floating-point arithmetic. With an FPU, it is a single instruction.
The difference is dramatic: 10x to 50x faster for floating-point operations. If your project involves sensor math, PID control loops, audio processing, or anything with decimal numbers, choose a chip with an FPU (Cortex-M4F or higher).
Think About It: A Cortex-M0 running at 48 MHz without an FPU might be slower at floating-point math than a Cortex-M4F running at 48 MHz. Raw clock speed is not the whole story.
Memory-Mapped I/O: THE Key Concept
This is the single most important concept in embedded programming. Read this section twice.
On your desktop computer, hardware is accessed through drivers and operating system calls. On a microcontroller, hardware is accessed by reading and writing specific memory addresses.
Every peripheral, every register, every control bit lives at a fixed address in the chip's memory map:
| Address Range | What Lives There |
|---|---|
0x0000_0000 - 0x0007_FFFF | Flash memory (your program) |
0x2000_0000 - 0x2001_FFFF | RAM (your variables) |
0x4002_0000 - 0x4002_03FF | GPIOA registers |
0x4000_4400 - 0x4000_47FF | USART2 registers |
0x4001_3000 - 0x4001_33FF | SPI1 registers |
When you write a value to address 0x4002_0014, you are not writing to memory. You are changing the physical voltage on the GPIOA output pins. That is memory-mapped I/O: the hardware pretends to be memory, and writing to those "memory locations" controls real-world electrical signals.
Here is what toggling pin PA5 looks like in raw register manipulation (C-style), versus Embassy:
#![allow(unused)] fn main() { // Raw register access (what happens underneath) // Set bit 5 of GPIOA ODR to turn on PA5 unsafe { let gpioa_odr = 0x4002_0014 as *mut u32; let current = core::ptr::read_volatile(gpioa_odr); core::ptr::write_volatile(gpioa_odr, current | (1 << 5)); } // Embassy (what you actually write) let mut pin = Output::new(p.PA5, Level::Low, Speed::Low); pin.set_high(); }
Both do the exact same thing. Embassy just wraps the horror in a safe, readable API.
Registers: The 32-Bit Control Panel
A register is a 32-bit value at a fixed address that controls or reports the state of a peripheral. Think of it as a row of 32 tiny switches.
For example, the GPIO Output Data Register (ODR) for GPIOA:
Bit: 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16
[ Reserved (read as 0) ]
Bit: 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
P15 P14 P13 P12 P11 P10 P9 P8 P7 P6 P5 P4 P3 P2 P1 P0
Each bit directly controls one pin. Bit 5 = PA5. Set it to 1 and the pin goes to 3.3V. Set it to 0 and the pin goes to 0V.
Registers come in three flavors:
- Control registers — you write to configure behavior (e.g., set a pin as input or output)
- Status registers — you read to check what happened (e.g., has a UART byte arrived?)
- Data registers — you read/write to exchange data (e.g., the byte to transmit via UART)
The Toolchain
You cannot compile code on your PC and run it on an STM32 — they have completely different processors. You need a cross-compiler: a tool that runs on your PC but produces code for the ARM Cortex-M.
The Rust embedded toolchain consists of:
| Component | Purpose |
|---|---|
rustup target add thumbv7em-none-eabihf | Install the cross-compilation target |
probe-rs | Flash firmware and debug via ST-Link/DAPLink |
defmt + defmt-rtt | Lightweight logging over the debug probe |
cargo embed or cargo run (with probe-rs) | Build, flash, and run in one command |
The target triple thumbv7em-none-eabihf breaks down as:
thumb— ARM Thumb instruction setv7em— ARMv7E-M architecture (Cortex-M4/M7)none— no operating systemeabi— Embedded ABI (calling convention)hf— hardware floating point
What Makes Embedded Different
If you are coming from desktop or web development, embedded is a different world:
| Desktop | Embedded |
|---|---|
| Operating system manages everything | No OS — your code is the only thing running |
Print to terminal with println! | No terminal — use defmt over debug probe or UART |
| Program exits when done | Runs forever — main must never return |
| Timing is "fast enough" | Real-time — a 1 ms deadline means 1 ms, not "roughly 1 ms" |
| Bugs crash a program | Bugs affect physical hardware — wrong output can damage circuits |
| Gigabytes of RAM | Kilobytes of RAM — every byte matters |
Fun Fact: When NASA's Voyager 1 was reprogrammed in 2023, engineers uploaded new code to a computer with 69 KB of memory — less RAM than most STM32s. It has been running since 1977, 15 billion miles from Earth. That is embedded programming at its finest.
Summary
A microcontroller is a self-contained computer on a chip. It has a CPU, Flash, RAM, and peripherals — all controlled by reading and writing to specific memory addresses. The STM32 family uses ARM Cortex-M cores ranging from the basic M0 to the powerful M7, all programmed using the same Rust toolchain.
The key insight is memory-mapped I/O: hardware registers live at fixed addresses, and writing to those addresses controls physical pins and peripherals. Embassy abstracts this into safe Rust APIs, but the registers are always there underneath.
Next, we will explore the STM32 family in detail — which chip to choose, what the part numbers mean, and how to navigate the (enormous) datasheet.