Memory Architecture
If you have been following along on an STM32F4 or G0, you may not have thought about memory at all. That is by design — simpler chips keep things simple. But once you reach the STM32H7, memory becomes something you actively manage. This chapter explains why, and gives you the mental model to handle it.
The Simple World: F1, F4, G0
On most STM32 chips, memory looks like this:
| Region | Typical Size | Address Start | Notes |
|---|---|---|---|
| Flash | 64KB–1MB | 0x0800_0000 | Program code lives here |
| SRAM | 20KB–128KB | 0x2000_0000 | Variables, stack, heap |
Everything is on one bus. The CPU can read Flash and SRAM. The DMA controller can read and write SRAM. Peripherals talk to DMA, DMA talks to SRAM, the CPU reads SRAM. No caches, no tricks, no surprises.
CPU ──────┐
├──── Bus Matrix ──── Flash
DMA ──────┤ ──── SRAM
├──── Peripherals
Your linker script puts .text in Flash, .bss and .data in SRAM, and you never think about it again. This is the world most tutorials assume.
Think About It: If you have only worked with Arduino or simple STM32 boards, you have been living in this simple world the entire time — and that is perfectly fine for most projects.
The F4 Trap: CCM RAM
The STM32F4 series has a sneaky extra region called CCM (Core Coupled Memory). On the F407, it is 64KB sitting at address 0x1000_0000.
| Region | Size | DMA Access | CPU Access | Speed |
|---|---|---|---|---|
| Main SRAM | 128KB | Yes | Yes | Fast |
| CCM RAM | 64KB | No | Yes | Fastest |
CCM is directly wired to the CPU with zero wait states — it is the fastest RAM on the chip. But DMA cannot touch it. If you place a UART transmit buffer in CCM and then ask DMA to send it, nothing happens. No error, no crash — DMA just silently reads zeros or old data.
#![allow(unused)] fn main() { // DANGER: If this buffer lands in CCM, DMA transfers will silently fail static mut TX_BUF: [u8; 256] = [0u8; 256]; }
The fix is straightforward: do not put DMA buffers in CCM. Use CCM for stack space or computation scratch pads where only the CPU reads and writes.
Fun Fact: Many developers have lost hours debugging "DMA not working" on F4 chips, only to discover their buffer was allocated in CCM. If DMA output looks like garbage, check your memory map first.
The Complex World: STM32H7
The H7 is a different beast. It has seven distinct RAM regions, each with different properties:
| Region | Size | Address | CPU | DMA | Cache | Best For |
|---|---|---|---|---|---|---|
| ITCM | 64KB | 0x0000_0000 | Yes | No | No | Fastest code execution |
| DTCM | 128KB | 0x2000_0000 | Yes | No | No | Fastest data, control variables |
| AXI SRAM | 512KB | 0x2400_0000 | Yes | Yes | Yes | Large general-purpose buffers |
| SRAM1 | 128KB | 0x3000_0000 | Yes | Yes | No | DMA buffers |
| SRAM2 | 128KB | 0x3002_0000 | Yes | Yes | No | DMA buffers |
| SRAM3 | 32KB | 0x3004_0000 | Yes | Yes | No | DMA buffers |
| SRAM4 | 64KB | 0x3800_0000 | Yes | Yes | No | Backup domain, low-power |
That is over 1MB of RAM total, but you cannot just treat it as one big pool.
Why So Many Regions?
Each region connects to a different bus inside the chip. DTCM is wired straight to the Cortex-M7 core — zero wait states, no contention. AXI SRAM goes through the AXI bus and the L1 cache. SRAM1-3 sit on the AHB bus where DMA controllers live.
Cortex-M7 Core
├── ITCM (instruction tightly-coupled memory)
├── DTCM (data tightly-coupled memory)
├── L1 Cache ──── AXI Bus ──── AXI SRAM (512KB)
└── AHB Bus ──── SRAM1/2/3 ──── DMA1, DMA2
──── SRAM4 (backup domain)
Which Memory for What
Here is the practical decision tree:
DTCM (128KB) — Use for variables the CPU accesses constantly: PID loop state, filter coefficients, control flags. It is the fastest data memory but DMA cannot reach it.
SRAM1/2/3 (288KB total) — Use for any buffer that DMA touches: UART RX/TX buffers, SPI transfer buffers, ADC sample arrays. These regions are not cached, so you avoid the cache coherency headaches.
AXI SRAM (512KB) — Use for large data structures that the CPU works with: sensor log arrays, lookup tables, image processing buffers. It goes through the cache, so it is fast for sequential CPU access but needs cache maintenance if DMA also touches it.
SRAM4 (64KB) — Special region in the backup power domain. Use it for data that must survive a reset or data shared with low-power modes.
#![allow(unused)] fn main() { // In your linker script or with #[link_section]: // Control variables — fast CPU access, no DMA needed #[link_section = ".dtcm_bss"] static mut PID_STATE: PidState = PidState::new(); // DMA buffer — must be in SRAM1, not DTCM or AXI #[link_section = ".sram1_bss"] static mut UART_TX_BUF: [u8; 512] = [0u8; 512]; // Large working buffer — cached AXI SRAM is fine #[link_section = ".axisram_bss"] static mut SENSOR_LOG: [SensorReading; 4096] = [SensorReading::ZERO; 4096]; }
Linker Script Basics
The linker script is the file that tells the linker where each section of your program goes in physical memory. On simple chips, the default memory.x from cortex-m-rt works fine. On the H7, you need to customize it.
A minimal memory.x for the STM32H743:
MEMORY
{
FLASH : ORIGIN = 0x08000000, LENGTH = 2M
DTCM : ORIGIN = 0x20000000, LENGTH = 128K
AXISRAM : ORIGIN = 0x24000000, LENGTH = 512K
SRAM1 : ORIGIN = 0x30000000, LENGTH = 128K
SRAM2 : ORIGIN = 0x30020000, LENGTH = 128K
SRAM3 : ORIGIN = 0x30040000, LENGTH = 32K
SRAM4 : ORIGIN = 0x38000000, LENGTH = 64K
ITCM : ORIGIN = 0x00000000, LENGTH = 64K
}
SECTIONS
{
.sram1_bss (NOLOAD) : {
*(.sram1_bss .sram1_bss.*);
} > SRAM1
.axisram_bss (NOLOAD) : {
*(.axisram_bss .axisram_bss.*);
} > AXISRAM
.dtcm_bss (NOLOAD) : {
*(.dtcm_bss .dtcm_bss.*);
} > DTCM
}
The default .bss and .data sections (your normal global variables and the stack) go into whichever region you map RAM to. On the H7, Embassy typically maps RAM to DTCM for maximum speed.
Think About It: The linker script is not magic — it is just a mapping from section names to address ranges. When you write
#[link_section = ".sram1_bss"], you are telling the compiler "put this variable in the section called.sram1_bss," and the linker script says "that section lives in SRAM1 at address0x3000_0000."
Cache Coherency — The H7 Gotcha
AXI SRAM goes through the Cortex-M7's L1 data cache. This means the CPU might be reading a cached copy of memory while DMA writes new data to the actual SRAM. They get out of sync.
Two solutions:
-
Avoid the problem: Put DMA buffers in SRAM1-3 (not cached). This is the simplest approach and what Embassy does by default for its DMA buffers.
-
Manage the cache: Invalidate the cache before reading DMA results, clean the cache before starting a DMA write. This is error-prone and rarely worth the complexity.
#![allow(unused)] fn main() { // The easy way: just use the right memory region #[link_section = ".sram1_bss"] static mut ADC_DMA_BUF: [u16; 256] = [0u16; 256]; // No cache worries — SRAM1 is not cached }
Practical Advice
For F0, G0, F1, F3, L0, L4, G4 — do not worry about any of this. You have one RAM, everything works with DMA, there is no cache. Use the default linker script and move on.
For F4 — be aware of CCM. If DMA mysteriously fails, check whether your buffer is in CCM. Otherwise, treat it like the simple chips.
For H7 — take 30 minutes to set up your linker script correctly at the start of the project. Put DMA buffers in SRAM1. Put your stack and hot variables in DTCM. Use AXI SRAM for large CPU-only data. Then you can stop thinking about it for the rest of the project.
Fun Fact: The STM32H7's total internal RAM (over 1MB) is larger than the entire Flash memory of many STM32F0 chips. Memory architecture reflects the enormous range of complexity in the STM32 family.
Summary
| Chip Family | Memory Model | Key Concern |
|---|---|---|
| F0, G0, L0 | One RAM, simple | None |
| F1, F3, L4, G4 | One RAM, simple | None |
| F4 | Main SRAM + CCM | CCM is CPU-only, no DMA |
| F7 | Multiple SRAM + cache | Cache coherency with DMA |
| H7 | 7 RAM regions + cache | Region placement for DMA, cache coherency |
Start simple. Graduate to complexity only when your chip demands it.