Memory Architecture

If you have been following along on an STM32F4 or G0, you may not have thought about memory at all. That is by design — simpler chips keep things simple. But once you reach the STM32H7, memory becomes something you actively manage. This chapter explains why, and gives you the mental model to handle it.

The Simple World: F1, F4, G0

On most STM32 chips, memory looks like this:

RegionTypical SizeAddress StartNotes
Flash64KB–1MB0x0800_0000Program code lives here
SRAM20KB–128KB0x2000_0000Variables, stack, heap

Everything is on one bus. The CPU can read Flash and SRAM. The DMA controller can read and write SRAM. Peripherals talk to DMA, DMA talks to SRAM, the CPU reads SRAM. No caches, no tricks, no surprises.

CPU ──────┐
          ├──── Bus Matrix ──── Flash
DMA ──────┤                ──── SRAM
          ├──── Peripherals

Your linker script puts .text in Flash, .bss and .data in SRAM, and you never think about it again. This is the world most tutorials assume.

Think About It: If you have only worked with Arduino or simple STM32 boards, you have been living in this simple world the entire time — and that is perfectly fine for most projects.

The F4 Trap: CCM RAM

The STM32F4 series has a sneaky extra region called CCM (Core Coupled Memory). On the F407, it is 64KB sitting at address 0x1000_0000.

RegionSizeDMA AccessCPU AccessSpeed
Main SRAM128KBYesYesFast
CCM RAM64KBNoYesFastest

CCM is directly wired to the CPU with zero wait states — it is the fastest RAM on the chip. But DMA cannot touch it. If you place a UART transmit buffer in CCM and then ask DMA to send it, nothing happens. No error, no crash — DMA just silently reads zeros or old data.

#![allow(unused)]
fn main() {
// DANGER: If this buffer lands in CCM, DMA transfers will silently fail
static mut TX_BUF: [u8; 256] = [0u8; 256];
}

The fix is straightforward: do not put DMA buffers in CCM. Use CCM for stack space or computation scratch pads where only the CPU reads and writes.

Fun Fact: Many developers have lost hours debugging "DMA not working" on F4 chips, only to discover their buffer was allocated in CCM. If DMA output looks like garbage, check your memory map first.

The Complex World: STM32H7

The H7 is a different beast. It has seven distinct RAM regions, each with different properties:

RegionSizeAddressCPUDMACacheBest For
ITCM64KB0x0000_0000YesNoNoFastest code execution
DTCM128KB0x2000_0000YesNoNoFastest data, control variables
AXI SRAM512KB0x2400_0000YesYesYesLarge general-purpose buffers
SRAM1128KB0x3000_0000YesYesNoDMA buffers
SRAM2128KB0x3002_0000YesYesNoDMA buffers
SRAM332KB0x3004_0000YesYesNoDMA buffers
SRAM464KB0x3800_0000YesYesNoBackup domain, low-power

That is over 1MB of RAM total, but you cannot just treat it as one big pool.

Why So Many Regions?

Each region connects to a different bus inside the chip. DTCM is wired straight to the Cortex-M7 core — zero wait states, no contention. AXI SRAM goes through the AXI bus and the L1 cache. SRAM1-3 sit on the AHB bus where DMA controllers live.

Cortex-M7 Core
  ├── ITCM (instruction tightly-coupled memory)
  ├── DTCM (data tightly-coupled memory)
  ├── L1 Cache ──── AXI Bus ──── AXI SRAM (512KB)
  └── AHB Bus ──── SRAM1/2/3 ──── DMA1, DMA2
                               ──── SRAM4 (backup domain)

Which Memory for What

Here is the practical decision tree:

DTCM (128KB) — Use for variables the CPU accesses constantly: PID loop state, filter coefficients, control flags. It is the fastest data memory but DMA cannot reach it.

SRAM1/2/3 (288KB total) — Use for any buffer that DMA touches: UART RX/TX buffers, SPI transfer buffers, ADC sample arrays. These regions are not cached, so you avoid the cache coherency headaches.

AXI SRAM (512KB) — Use for large data structures that the CPU works with: sensor log arrays, lookup tables, image processing buffers. It goes through the cache, so it is fast for sequential CPU access but needs cache maintenance if DMA also touches it.

SRAM4 (64KB) — Special region in the backup power domain. Use it for data that must survive a reset or data shared with low-power modes.

#![allow(unused)]
fn main() {
// In your linker script or with #[link_section]:

// Control variables — fast CPU access, no DMA needed
#[link_section = ".dtcm_bss"]
static mut PID_STATE: PidState = PidState::new();

// DMA buffer — must be in SRAM1, not DTCM or AXI
#[link_section = ".sram1_bss"]
static mut UART_TX_BUF: [u8; 512] = [0u8; 512];

// Large working buffer — cached AXI SRAM is fine
#[link_section = ".axisram_bss"]
static mut SENSOR_LOG: [SensorReading; 4096] = [SensorReading::ZERO; 4096];
}

Linker Script Basics

The linker script is the file that tells the linker where each section of your program goes in physical memory. On simple chips, the default memory.x from cortex-m-rt works fine. On the H7, you need to customize it.

A minimal memory.x for the STM32H743:

MEMORY
{
  FLASH  : ORIGIN = 0x08000000, LENGTH = 2M
  DTCM   : ORIGIN = 0x20000000, LENGTH = 128K
  AXISRAM : ORIGIN = 0x24000000, LENGTH = 512K
  SRAM1  : ORIGIN = 0x30000000, LENGTH = 128K
  SRAM2  : ORIGIN = 0x30020000, LENGTH = 128K
  SRAM3  : ORIGIN = 0x30040000, LENGTH = 32K
  SRAM4  : ORIGIN = 0x38000000, LENGTH = 64K
  ITCM   : ORIGIN = 0x00000000, LENGTH = 64K
}

SECTIONS
{
  .sram1_bss (NOLOAD) : {
    *(.sram1_bss .sram1_bss.*);
  } > SRAM1

  .axisram_bss (NOLOAD) : {
    *(.axisram_bss .axisram_bss.*);
  } > AXISRAM

  .dtcm_bss (NOLOAD) : {
    *(.dtcm_bss .dtcm_bss.*);
  } > DTCM
}

The default .bss and .data sections (your normal global variables and the stack) go into whichever region you map RAM to. On the H7, Embassy typically maps RAM to DTCM for maximum speed.

Think About It: The linker script is not magic — it is just a mapping from section names to address ranges. When you write #[link_section = ".sram1_bss"], you are telling the compiler "put this variable in the section called .sram1_bss," and the linker script says "that section lives in SRAM1 at address 0x3000_0000."

Cache Coherency — The H7 Gotcha

AXI SRAM goes through the Cortex-M7's L1 data cache. This means the CPU might be reading a cached copy of memory while DMA writes new data to the actual SRAM. They get out of sync.

Two solutions:

  1. Avoid the problem: Put DMA buffers in SRAM1-3 (not cached). This is the simplest approach and what Embassy does by default for its DMA buffers.

  2. Manage the cache: Invalidate the cache before reading DMA results, clean the cache before starting a DMA write. This is error-prone and rarely worth the complexity.

#![allow(unused)]
fn main() {
// The easy way: just use the right memory region
#[link_section = ".sram1_bss"]
static mut ADC_DMA_BUF: [u16; 256] = [0u16; 256];
// No cache worries — SRAM1 is not cached
}

Practical Advice

For F0, G0, F1, F3, L0, L4, G4 — do not worry about any of this. You have one RAM, everything works with DMA, there is no cache. Use the default linker script and move on.

For F4 — be aware of CCM. If DMA mysteriously fails, check whether your buffer is in CCM. Otherwise, treat it like the simple chips.

For H7 — take 30 minutes to set up your linker script correctly at the start of the project. Put DMA buffers in SRAM1. Put your stack and hot variables in DTCM. Use AXI SRAM for large CPU-only data. Then you can stop thinking about it for the rest of the project.

Fun Fact: The STM32H7's total internal RAM (over 1MB) is larger than the entire Flash memory of many STM32F0 chips. Memory architecture reflects the enormous range of complexity in the STM32 family.

Summary

Chip FamilyMemory ModelKey Concern
F0, G0, L0One RAM, simpleNone
F1, F3, L4, G4One RAM, simpleNone
F4Main SRAM + CCMCCM is CPU-only, no DMA
F7Multiple SRAM + cacheCache coherency with DMA
H77 RAM regions + cacheRegion placement for DMA, cache coherency

Start simple. Graduate to complexity only when your chip demands it.