首页
学习
活动
专区
圈层
工具
发布
社区首页 >专栏 >ST-Link 高速烧录的秘密.下(硬件实现)

ST-Link 高速烧录的秘密.下(硬件实现)

作者头像
云深无际
发布2026-03-03 12:35:46
发布2026-03-03 12:35:46
590
举报
文章被收录于专栏:云深之无迹云深之无迹

MCU 代码完全可以在 RAM 里运行而且在很多场景下,这是“必须这么做”的正确设计

从 MCU 的执行本质说起:CPU 根本不关心“这是 Flash 还是 RAM”

这是理解一切的关键,CPU 的真实世界只有一件事

取指 → 解码 → 执行

对 CPU 来说:Flash,RAM,ROM,外部 SRAM只要它们被映射到地址空间里, 并且是“可执行”的内存CPU 就可以从那里取指执行。(CPU 根本不挑,也不在乎,管你这那)

为什么“默认”代码在 Flash 里跑?

这是工程选择,不是“硬性限制”;Flash 的优势是上电即有内容(非易失),不占宝贵 RAM,成本低。(普源有些示波器开机时间漫长,其实就是 FPGA 读取的 Flash 时间太长了),所以:Reset 后 PC 指向 Flash,启动代码在 Flash,main() 在 Flash, 这只是“默认设计”。(注意 PC 指针,CPU 心无旁骛的就盯着这个看)

什么时候“必须”在 RAM 里跑代码?

下面这些场景,大家已经或迟早会遇到。

Flash 正在被操作(擦 / 写)

这是最经典的场景

Flash 在擦写时,通常不能被同时读取

如果在 Flash 里执行代码,同时又在擦写 Flash:轻则卡死,重则 HardFault。方案就是把 Flash 操作函数放进 RAM 执行,比如在Bootloader,OTA 升级,参数区写入等。

对“执行速度”极端敏感

RAM 通常访问延迟更低,更没有 wait state,所以:中断处理,DSP / 控制环,高频 bit-bang 放 RAM 里跑更快、更确定

某些外设访问有“总线冲突”

例如在 STM32 里:Flash 挂在 ICode/DCode/AHB,此时DMA / USB / ETH 同时抢总线;把关键代码放 RAM里面可以降低抖动,同时提高实时性。

需要“动态加载 / 解压 / 重定位”

比如Bootloader 把应用解压到 RAM,以及从外部 Flash / SPI Flash 加载代码,自修改代码(极少见,但存在)

MCU 是怎么“在 RAM 里跑代码”的?

核心机制只有两个字:

链接 + 重定位

链接阶段:告诉编译器“这段代码属于 RAM”

在链接脚本里(以 STM32 为例):

代码语言:javascript
复制
.ramfunc :
{
    *(.ramfunc*)
} > RAM

源码里标记函数

代码语言:javascript
复制
__attribute__((section(".ramfunc")))
void flash_write_page(void)
{
    // 这里的代码会被放进 RAM
}

或(STM32 常见):

代码语言:javascript
复制
__RAM_FUNC void flash_write_page(void)
{
}

启动时:从 Flash 拷贝到 RAM

启动代码会做这件事:

代码语言:javascript
复制
Flash 中的 .ramfunc
    ↓ memcpy
RAM 中的 .ramfunc

然后:PC 跳转到 RAM 地址,CPU 开始从 RAM 取指, 对 CPU 来说,这一切是透明的

是不是“整个程序”都能放 RAM?

技术上可以,但是工程上很少这么做;原因很现实:RAM 太小,上电内容丢失。

典型做法是:

内容

放哪

启动代码

Flash

main / 逻辑

Flash

Flash 操作函数

RAM

高频 ISR

RAM

临时加载模块

RAM

在 RAM 里跑代码的限制

不能用 Flash 里的常量/函数

在 RAM 函数里:

代码语言:javascript
复制
const uint32_t table[] = {...}; // 在 Flash

如果 Flash 正在擦写 → 访问它 = 异常;所以把常量也放 RAM或擦写期间不访问。

中断向量表的位置

如果在 RAM 执行中断相关代码:

向量表还在 Flash?

Flash 正在被擦写?

可以把 中断向量表也重定位到 RAM

代码语言:javascript
复制
SCB->VTOR = RAM_VECTOR_TABLE_ADDR;

cache / MPU / XIP 问题(高端 MCU)

在有:I-Cache,D-Cache,MPU的 MCU 上,RAM 是否可执行?cache 是否一致?都要确认。

详细分析

flash_loader.c 的核心思想

它不是在 PC 上“直接写 Flash”。而是:PC(stlink 工具)通过 SWD 把一小段“写 Flash 的汇编程序”写进目标 MCU 的 SRAM,再通过 SWD 设置目标 MCU 的寄存器(R0-R3/PC),让目标 MCU 从 SRAM 开始执行这段 loader 代码

loader 代码在 MCU 内部执行:从 SRAM 取数据,写到 Flash 寄存器/Flash 地址;写完后通过一个 BKPT(breakpoint)停住,让 stlink 知道完成。

看到 #include <stlink.h>stm32_register.hread_write.hlogging.h 等,这说明:这段代码运行在 PC(或 host)上,通过 stlink 的 SWD 调试接口去读写目标 MCU 寄存器/内存。

两个关键宏

代码语言:javascript
复制
#define FLASH_REGS_BANK2_OFS      0x40
#define FLASH_BANK2_START_ADDR    0x08080000

这俩是为了 STM32F1_XL 那类双 Bank 的特殊映射:当目标地址落在 bank2 起始之后,loader 需要用不同的 Flash 寄存器基址偏移(R3 传进去)。

看门狗 Key Register 地址

代码语言:javascript
复制
#define STM32F0_WDG_KR            0x40003000
#define STM32H7_WDG_KR            0x58004800
#define STM32F0_WDG_KR_KEY_RELOAD 0xAAAA

这用于在运行 loader 前“喂狗”,避免目标 MCU 在写 Flash 较慢时被 IWDG 复位。

loader_code_stm32xxx[]:这些字节数组是什么?

看到一堆:

loader_code_stm32vl

loader_code_stm32f0

loader_code_stm32lx

loader_code_stm32f4 / _lv

loader_code_stm32l4

loader_code_stm32f7 / _lv

loader_code_stm32wb0

它们是编译好的 Thumb 指令机器码(来自注释里的 flashloaders/*.s),目标是:

在目标 MCU 上执行 “memcpy + Flash busy 轮询 + 写入 + 计数递减 + BKPT 停止”。

可以从数组结尾常见的 0x00, 0xbe 看出来: 0xBE00 是 Thumb 的 BKPT 0(断点指令);loader 执行完就触发 BKPT,core 进入 halt,PC 侧轮询 stlink_is_core_halted() 就能知道结束。

stlink_flash_loader_init():初始化做了什么?为什么这样做?

代码语言:javascript
复制
stlink_write_debug32(DHCSR, DBGKEY|C_DEBUGEN|C_HALT);
stlink_write_debug32(DHCSR, DBGKEY|C_DEBUGEN|C_HALT|C_MASKINTS);

先 halt 再 mask interrupts

注释里写得很直白:按照 ARM DDI0419C 的建议,先强制 halt,再关中断;原因是:如果 core 正在跑,直接 MASKINTS,有些情况下会出现状态不可控(尤其在异常/中断进出期间),需要halt 之后状态稳定,再 MASKINTS 比较安全。

把 loader 写到 SRAM

代码语言:javascript
复制
stlink_flash_loader_write_to_sram(...)
fl->buf_addr = fl->loader_addr + size;

loader_addr = sram_base紧接着在 loader 后面放一个“数据缓冲区”,用于写入固件 chunk

选择喂狗地址 + 清 Fault 寄存器

代码语言:javascript
复制
DFSR/CFSR/HFSR

这些是 ARM 的 Fault 状态寄存器;清它们的目的:运行 loader 之前把“历史错误状态”清干净,如果 loader 运行失败,读到的 fault 才是“这次的真实原因”。

loader_v_dependent_assignment():为什么要读目标电压?

STLINK V2/V3 能读 target voltage,如果电压高于 2700mV 时,允许更高并行度写(比如 32-bit);而低电压时必须用更保守的写法(8-bit),否则 Flash 编程可能失败

所以对 F4/F7 这种系列,它会在 loader_code_stm32f4loader_code_stm32f4_lv 之间选一个。

stlink_flash_loader_write_to_sram():根据芯片挑 loader,再写入 SRAM

这段巨长的 if/else 做两件事:

  1. chip_id / core_id / flash_type 选择 loader_code 指针和 size
  2. memcpy(sl->q_buf, loader_code, loader_size) 然后 stlink_write_mem32(sl, sl->sram_base, loader_size)

注意注释里强调:

loader 二进制大小必须 4 字节对齐,因为用的是 stlink_write_mem32(32-bit 写)

stlink_flash_loader_run():真正“执行一次写入”的全过程

这段是全文件最核心。

先把待写数据写到 SRAM 缓冲

代码语言:javascript
复制
write_buffer_to_sram(sl, fl, buf, size, padded_size)

然后有一个重要的“对齐补齐”逻辑:默认 pad_modulo = 1,如果 flash_type == WB0pad_modulo = 16,不足对齐就把 padded_size 补到对齐长度。

某些系列写入必须按写粒度对齐,否则 loader 内部按块写会越界或失败。

处理 F1_XL 的 bank2

代码语言:javascript
复制
if(F1_XL && target >= 0x08080000) flash_base = 0x40;

它把这个值塞到 R3,loader 用它决定 Flash 寄存器基址偏移。

“设置 core 的寄存器”,然后把 PC 指向 loader

代码语言:javascript
复制
R0 = buf_addr   // source
R1 = target     // target flash address
R2 = padded_size// count
R3 = flash_base // flash reg base offset
PC = loader_addr

PC 侧通过 SWD 在“远程操控目标 CPU 的寄存器”,把它布置成一次函数调用现场。

喂狗(可选)

代码语言:javascript
复制
stlink_write_debug32(iwdg_kr, 0xAAAA);

Run loader

代码语言:javascript
复制
stlink_run(sl, RUN_FLASH_LOADER);

这会让 core 从 SRAM 的 PC 开始执行,直到遇到 BKPT 停住。

等待 loader 结束:轮询 halted

它用 500ms 超时,循环 sleep 10ms 并检查 stlink_is_core_halted(sl);之前的注释解释了为什么不用 10us:Unix 的调度 tick 会把 10us 四舍五入到 1ms,导致超时等待变得特别长。

用 R2 检查“剩余字节数”

loader 的逻辑是:每写一小块就从 R2 里减去写入长度,所以运行结束时:

理想:R2 <= 0(写完或略过补齐)

如果 R2 > 0:说明还没写完就退出了

如果 R2 < -7:说明补齐导致写入了过多垃圾(注释解释了 -7 的边界)

因此:

代码语言:javascript
复制
if (rr.r[2] > 0 || rr.r[2] < -7) -> error

出错时读 DHCSR/DFSR/CFSR/HFSR + 所有寄存器

这会把“loader 卡在哪、Fault 是什么、寄存器传参是什么”全部打出来,非常利于定位:是不是跑飞了,参数是否正确,是否出现 HardFault/BusFault 等;同步的看PC,R0/R1/R2/R3,CFSR/HFSR就可以。

set_dma_state():写 Flash 前先关 DMA,写完再恢复

它在不同系列用不同 RCC 寄存器和 mask 去关 DMA clock。DMA 可能在后台读写 SRAM/外设,总线争用;某些场景会影响写 Flash 的时序或稳定性,更关键的是DMA 若访问 Flash/相关资源,可能导致不可控行为

所以这段代码提供了:备份 DMA enable 位,清掉 DMA enable,最后再恢复。

stlink_flashloader_write():真正写数据(分系列实现)

会看到几个流派:

用 loader 的:F2/F4/F7/L4/WB0(Flash区)分 chunk 写(chunk 尽量大,但不超过 SRAM buffer)

直接 debug32 写 Flash 地址的:WB/G0/G4/L5...(它是一 word 一 word 写,同时等待 busy)

L0/L1:半页写(loader 失败则软写),并涉及 PECR 位

F0/F1/F3:按页大小循环,页级别用 loader,一页写完 lock

H7:以 64-byte flash word 粒度写(写 mem32 64B)

WB0 OTP:不能用 burst loader,只能按 word 写,而且 out-of-bounds 字节用 0xFF 保持不变

每一种都体现该系列 Flash 的“最小写粒度、寄存器机制、限制条件”。

stlink_flashloader_stop():收尾

清 PG 位,lock flash,恢复 DHCSR 中 MASKINTS(重新允许中断),恢复 DMA 状态;这一套是“把目标 MCU 从写 Flash 模式带回可正常运行的调试状态”。

后记

是不是感觉非常精密,这就是人造之物。

代码语言:javascript
复制
/*
 * File: flash_loader.c
 *
 * Flash loaders
 */

#include <stdint.h>
#include <stdio.h>
#include <string.h>
#include <unistd.h>

#include <stm32.h>
#include <stm32_register.h>
#include <stlink.h>

#include "flash_loader.h"
#include "common_flash.h"
#include "helper.h"
#include "logging.h"
#include "read_write.h"

#define FLASH_REGS_BANK2_OFS      0x40
#define FLASH_BANK2_START_ADDR    0x08080000

#define STM32F0_WDG_KR            0x40003000
#define STM32H7_WDG_KR            0x58004800

#define STM32F0_WDG_KR_KEY_RELOAD 0xAAAA

/*
 * !!! DO NOT MODIFY FLASH LOADERS DIRECTLY !!!
 *
 * Edit assembly files in the '/flashloaders' instead. The sizes of binary
 * flash loaders must be aligned by 4 (it's written by stlink_write_mem32)
 */

// flashloaders/stm32f0.s -- compiled with thumb2
staticconstuint8_t loader_code_stm32vl[] = {
    0x00, 0xbf, 0x00, 0xbf,
    0x09, 0x4f, 0x1f, 0x44,
    0x09, 0x4d, 0x3d, 0x44,
    0x04, 0x88, 0x0c, 0x80,
    0x02, 0x30, 0x02, 0x31,
    0x4f, 0xf0, 0x01, 0x07,
    0x2c, 0x68, 0x3c, 0x42,
    0xfc, 0xd1, 0x4f, 0xf0,
    0x14, 0x07, 0x3c, 0x42,
    0x01, 0xd1, 0x02, 0x3a,
    0xf0, 0xdc, 0x00, 0xbe,
    0x00, 0x20, 0x02, 0x40,
    0x0c, 0x00, 0x00, 0x00
};

// flashloaders/stm32f0.s -- thumb1 only, same sequence as for STM32VL, bank ignored
staticconstuint8_t loader_code_stm32f0[] = {
    0xc0, 0x46, 0xc0, 0x46,
    0x08, 0x4f, 0x1f, 0x44,
    0x08, 0x4d, 0x3d, 0x44,
    0x04, 0x88, 0x0c, 0x80,
    0x02, 0x30, 0x02, 0x31,
    0x06, 0x4f, 0x2c, 0x68,
    0x3c, 0x42, 0xfc, 0xd1,
    0x05, 0x4f, 0x3c, 0x42,
    0x01, 0xd1, 0x02, 0x3a,
    0xf2, 0xdc, 0x00, 0xbe,
    0x00, 0x20, 0x02, 0x40,
    0x0c, 0x00, 0x00, 0x00,
    0x01, 0x00, 0x00, 0x00,
    0x14, 0x00, 0x00, 0x00
};

// flashloaders/stm32wb0.s
staticconstuint8_t loader_code_stm32wb0[] = {
    0x0b, 0x4b, 0xc9, 0x1a,
    0x89, 0x08, 0x09, 0x4f,
    0xb9, 0x61, 0x40, 0x37,
    0x78, 0xc8, 0x78, 0xc7,
    0x50, 0x3f, 0x0d, 0x25,
    0x3d, 0x61, 0xcc, 0x24,
    0x3c, 0x60, 0x3c, 0x69,
    0x2c, 0x40, 0xfc, 0xd0,
    0x64, 0x08, 0x02, 0xd1,
    0x04, 0x31, 0x10, 0x3a,
    0xee, 0xdc, 0x00, 0xbe,
    0x00, 0x10, 0x00, 0x40,
    0x00, 0x00, 0x04, 0x10
};

// flashloaders/stm32lx.s -- compiled for armv6-m for compatibility with both
// armv6-m cores (STM32L0) and armv7-m cores (STM32L1)
staticconstuint8_t loader_code_stm32lx[] = {
    0x04, 0x68, 0x0c, 0x60,
    0x04, 0x30, 0x04, 0x31,
    0x04, 0x3a, 0xf9, 0xdc,
    0x00, 0xbe, 0x00, 0x00
};

// flashloaders/stm32f4.s
staticconstuint8_t loader_code_stm32f4[] = {
    0xdf, 0xf8, 0x24, 0xc0,
    0xdf, 0xf8, 0x24, 0xa0,
    0xe2, 0x44, 0x04, 0x68,
    0x0c, 0x60, 0x00, 0xf1,
    0x04, 0x00, 0x01, 0xf1,
    0x04, 0x01, 0xba, 0xf8,
    0x00, 0x40, 0x14, 0xf0,
    0x01, 0x0f, 0xfa, 0xd1,
    0x04, 0x3a, 0xf2, 0xdc,
    0x00, 0xbe, 0x00, 0xbf,
    0x00, 0x3c, 0x02, 0x40,
    0x0e, 0x00, 0x00, 0x00
};

// flashloaders/stm32f4lv.s
staticconstuint8_t loader_code_stm32f4_lv[] = {
    0xdf, 0xf8, 0x24, 0xc0,
    0xdf, 0xf8, 0x24, 0xa0,
    0xe2, 0x44, 0x04, 0x78,
    0x0c, 0x70, 0x00, 0xf1,
    0x01, 0x00, 0x01, 0xf1,
    0x01, 0x01, 0xba, 0xf8,
    0x00, 0x40, 0x14, 0xf0,
    0x01, 0x0f, 0xfa, 0xd1,
    0x01, 0x3a, 0xf2, 0xdc,
    0x00, 0xbe, 0x00, 0xbf,
    0x00, 0x3c, 0x02, 0x40,
    0x0e, 0x00, 0x00, 0x00
};

// flashloaders/stm32l4.s
staticconstuint8_t loader_code_stm32l4[] = {
    0xdf, 0xf8, 0x28, 0xc0,
    0xdf, 0xf8, 0x28, 0xa0,
    0xe2, 0x44, 0x05, 0x68,
    0x44, 0x68, 0x0d, 0x60,
    0x4c, 0x60, 0x00, 0xf1,
    0x08, 0x00, 0x01, 0xf1,
    0x08, 0x01, 0xda, 0xf8,
    0x00, 0x40, 0x14, 0xf4,
    0x80, 0x3f, 0xfa, 0xd1,
    0x08, 0x3a, 0xf0, 0xdc,
    0x00, 0xbe, 0x00, 0xbf,
    0x00, 0x20, 0x02, 0x40,
    0x10, 0x00, 0x00, 0x00
};

// flashloaders/stm32f7.s
staticconstuint8_t loader_code_stm32f7[] = {
    0xdf, 0xf8, 0x28, 0xc0,
    0xdf, 0xf8, 0x28, 0xa0,
    0xe2, 0x44, 0x04, 0x68,
    0x0c, 0x60, 0x00, 0xf1,
    0x04, 0x00, 0x01, 0xf1,
    0x04, 0x01, 0xbf, 0xf3,
    0x4f, 0x8f, 0xba, 0xf8,
    0x00, 0x40, 0x14, 0xf0,
    0x01, 0x0f, 0xfa, 0xd1,
    0x04, 0x3a, 0xf0, 0xdc,
    0x00, 0xbe, 0x00, 0xbf,
    0x00, 0x3c, 0x02, 0x40,
    0x0e, 0x00, 0x00, 0x00
};

// flashloaders/stm32f7lv.s
staticconstuint8_t loader_code_stm32f7_lv[] = {
    0xdf, 0xf8, 0x28, 0xc0,
    0xdf, 0xf8, 0x28, 0xa0,
    0xe2, 0x44, 0x04, 0x78,
    0x0c, 0x70, 0x00, 0xf1,
    0x01, 0x00, 0x01, 0xf1,
    0x01, 0x01, 0xbf, 0xf3,
    0x4f, 0x8f, 0xba, 0xf8,
    0x00, 0x40, 0x14, 0xf0,
    0x01, 0x0f, 0xfa, 0xd1,
    0x01, 0x3a, 0xf0, 0xdc,
    0x00, 0xbe, 0x00, 0xbf,
    0x00, 0x3c, 0x02, 0x40,
    0x0e, 0x00, 0x00, 0x00
};


int32_t stlink_flash_loader_init(stlink_t *sl, flash_loader_t *fl) {
    uint32_t size = 0;
    uint32_t dfsr, cfsr, hfsr;

    /* Interrupt masking according to DDI0419C, Table C1-7 firstly force halt */
    stlink_write_debug32(sl, STM32_REG_DHCSR,
                           STM32_REG_DHCSR_DBGKEY | STM32_REG_DHCSR_C_DEBUGEN |
                           STM32_REG_DHCSR_C_HALT);
    /* and only then disable interrupts */
    stlink_write_debug32(sl, STM32_REG_DHCSR,
                           STM32_REG_DHCSR_DBGKEY | STM32_REG_DHCSR_C_DEBUGEN |
                           STM32_REG_DHCSR_C_HALT | STM32_REG_DHCSR_C_MASKINTS);

    // allocate the loader in SRAM
    if(stlink_flash_loader_write_to_sram(sl, &fl->loader_addr, &size) == -1) {
        WLOG("Failed to write flash loader to sram!\n");
        return (-1);
    }

    // allocate a one page buffer in SRAM right after loader
    fl->buf_addr = fl->loader_addr + size;
    ILOG("Successfully loaded flash loader in sram\n");

    // set address of IWDG key register for reset it
    if(sl->flash_type == STM32_FLASH_TYPE_H7) {
        fl->iwdg_kr = STM32H7_WDG_KR;
    } else {
        fl->iwdg_kr = STM32F0_WDG_KR;
    }

    /* Clear Fault Status Register for handling flash loader error */
    if(!stlink_read_debug32(sl, STM32_REG_DFSR, &dfsr) && dfsr) {
        ILOG("Clear DFSR\n");
        stlink_write_debug32(sl, STM32_REG_DFSR, dfsr);
    }
    if(!stlink_read_debug32(sl, STM32_REG_CFSR, &cfsr) && cfsr) {
        ILOG("Clear CFSR\n");
        stlink_write_debug32(sl, STM32_REG_CFSR, cfsr);
    }
    if(!stlink_read_debug32(sl, STM32_REG_HFSR, &hfsr) && hfsr) {
        ILOG("Clear HFSR\n");
        stlink_write_debug32(sl, STM32_REG_HFSR, hfsr);
    }

    return (0);
}

static int32_t loader_v_dependent_assignment(stlink_t *sl,
                                            const uint8_t **loader_code, uint32_t *loader_size,
                                            const uint8_t *high_v_loader, uint32_t high_v_loader_size,
                                            const uint8_t *low_v_loader, uint32_t low_v_loader_size) {
    int32_t retval = 0;

    if( sl->version.stlink_v == 1) {
        printf("STLINK V1 cannot read voltage, defaulting to 32-bit writes\n");
        *loader_code = high_v_loader;
        *loader_size = high_v_loader_size;
    } else {
        int32_t voltage = stlink_target_voltage(sl);

        if(voltage == -1) {
            retval = -1;
            printf("Failed to read Target voltage\n");
        } else   {
            if(voltage > 2700) {
                *loader_code = high_v_loader;
                *loader_size = high_v_loader_size;
            } else {
                *loader_code = low_v_loader;
                *loader_size = low_v_loader_size;
            }
        }
    }

    return (retval);
}

int32_t stlink_flash_loader_write_to_sram(stlink_t *sl, stm32_addr_t* addr, uint32_t* size) {
    constuint8_t* loader_code;
    uint32_t loader_size;

    if(sl->chip_id == STM32_CHIPID_L1_MD ||
        sl->chip_id == STM32_CHIPID_L1_CAT2 ||
        sl->chip_id == STM32_CHIPID_L1_MD_PLUS ||
        sl->chip_id == STM32_CHIPID_L1_MD_PLUS_HD ||
        sl->chip_id == STM32_CHIPID_L152_RE ||
        sl->chip_id == STM32_CHIPID_L0_CAT1 ||
        sl->chip_id == STM32_CHIPID_L0_CAT2 ||
        sl->chip_id == STM32_CHIPID_L0_CAT3 ||
        sl->chip_id == STM32_CHIPID_L0_CAT5) {
        loader_code = loader_code_stm32lx;
        loader_size = sizeof(loader_code_stm32lx);
    } elseif(sl->core_id == STM32_CORE_ID_M3_r1p1_SWD ||
               sl->chip_id == STM32_CHIPID_F1_MD ||
               sl->chip_id == STM32_CHIPID_F1_HD ||
               sl->chip_id == STM32_CHIPID_F1_LD ||
               sl->chip_id == STM32_CHIPID_F1_VL_MD_LD ||
               sl->chip_id == STM32_CHIPID_F1_VL_HD ||
               sl->chip_id == STM32_CHIPID_F1_XLD ||
               sl->chip_id == STM32_CHIPID_F1_CONN ||
               sl->chip_id == STM32_CHIPID_F3 ||
               sl->chip_id == STM32_CHIPID_F3xx_SMALL ||
               sl->chip_id == STM32_CHIPID_F303_HD ||
               sl->chip_id == STM32_CHIPID_F37x ||
               sl->chip_id == STM32_CHIPID_F334) {
        loader_code = loader_code_stm32vl;
        loader_size = sizeof(loader_code_stm32vl);
    } elseif(sl->chip_id == STM32_CHIPID_F2 ||
               sl->chip_id == STM32_CHIPID_F4 ||
               sl->chip_id == STM32_CHIPID_F4_DE ||
               sl->chip_id == STM32_CHIPID_F4_LP ||
               sl->chip_id == STM32_CHIPID_F4_HD ||
               sl->chip_id == STM32_CHIPID_F4_DSI ||
               sl->chip_id == STM32_CHIPID_F410 ||
               sl->chip_id == STM32_CHIPID_F411xx ||
               sl->chip_id == STM32_CHIPID_F412 ||
               sl->chip_id == STM32_CHIPID_F413 ||
               sl->chip_id == STM32_CHIPID_F446) {
        int32_t retval;
        retval = loader_v_dependent_assignment(sl,
                                               &loader_code, &loader_size,
                                               loader_code_stm32f4, sizeof(loader_code_stm32f4),
                                               loader_code_stm32f4_lv, sizeof(loader_code_stm32f4_lv));

        if(retval == -1) { return (retval); }
    } elseif(sl->core_id == STM32_CORE_ID_M7F_SWD ||
               sl->chip_id == STM32_CHIPID_F7 ||
               sl->chip_id == STM32_CHIPID_F76xxx ||
               sl->chip_id == STM32_CHIPID_F72xxx) {
        int32_t retval;
        retval = loader_v_dependent_assignment(sl,
                                               &loader_code, &loader_size,
                                               loader_code_stm32f7, sizeof(loader_code_stm32f7),
                                               loader_code_stm32f7_lv, sizeof(loader_code_stm32f7_lv));

        if(retval == -1) { return (retval); }
    } elseif(sl->chip_id == STM32_CHIPID_F0 ||
               sl->chip_id == STM32_CHIPID_F04 ||
               sl->chip_id == STM32_CHIPID_F0_CAN ||
               sl->chip_id == STM32_CHIPID_F0xx_SMALL ||
               sl->chip_id == STM32_CHIPID_F09x) {
        loader_code = loader_code_stm32f0;
        loader_size = sizeof(loader_code_stm32f0);
    } elseif(sl->flash_type == STM32_FLASH_TYPE_WB0) {
        loader_code = loader_code_stm32wb0;
        loader_size = sizeof(loader_code_stm32wb0);
    } elseif((sl->chip_id == STM32_CHIPID_L4) ||
               (sl->chip_id == STM32_CHIPID_L41x_L42x) ||
               (sl->chip_id == STM32_CHIPID_L43x_L44x) ||
               (sl->chip_id == STM32_CHIPID_L45x_L46x) ||
               (sl->chip_id == STM32_CHIPID_L4PX) ||
               (sl->chip_id == STM32_CHIPID_L4Rx) ||
               (sl->chip_id == STM32_CHIPID_L496x_L4A6x)) {
        loader_code = loader_code_stm32l4;
        loader_size = sizeof(loader_code_stm32l4);
    } else {
        ELOG("unknown coreid, not sure what flash loader to use, aborting! coreid: %x, chipid: %x\n",
            sl->core_id, sl->chip_id);
        return (-1);
    }

    memcpy(sl->q_buf, loader_code, loader_size);
    int32_t ret = stlink_write_mem32(sl, sl->sram_base, (uint16_t) loader_size);

    if(ret) { return (ret); }

    *addr = sl->sram_base;
    *size = loader_size;

    return (0); // success
}

int32_t stlink_flash_loader_run(stlink_t *sl, flash_loader_t* fl, stm32_addr_t target, const uint8_t* buf, uint32_t size) {
    struct stlink_reg rr;
    uint32_t timeout;
    uint32_t flash_base = 0;
    uint32_t dhcsr, dfsr, cfsr, hfsr;
    uint16_t padded_size = size, pad_modulo = 1;
    
    if(sl->flash_type == STM32_FLASH_TYPE_WB0) {
        pad_modulo = 16;
    }
    uint16_t unaligned_cnt = size % pad_modulo;
    if (unaligned_cnt) {
      padded_size = size + (pad_modulo - unaligned_cnt);
    }

    DLOG("Running flash loader, write address:%#x, size: %u, padded_size: %u\n", target, size, padded_size);

    if(write_buffer_to_sram(sl, fl, buf, size, padded_size) == -1) {
        ELOG("write_buffer_to_sram() == -1\n");
        return (-1);
    }

    if((sl->flash_type == STM32_FLASH_TYPE_F1_XL) && (target >= FLASH_BANK2_START_ADDR)) {
        flash_base = FLASH_REGS_BANK2_OFS;
    }

/* Setup core */
  stlink_write_reg(sl, fl->buf_addr, 0);     // source
  stlink_write_reg(sl, target, 1);           // target
  stlink_write_reg(sl, padded_size, 2);      // count
  stlink_write_reg(sl, flash_base, 3);       // flash register base
                                             // only used on VL/F1_XL, but harmless for others
  stlink_write_reg(sl, fl->loader_addr, 15); // pc register

/* Reset IWDG */
if(fl->iwdg_kr) {
      stlink_write_debug32(sl, fl->iwdg_kr, STM32F0_WDG_KR_KEY_RELOAD);
  }

/* Run loader */
  stlink_run(sl, RUN_FLASH_LOADER);

/*
 * This piece of code used to try to spin for .1 second by waiting doing 10000 rounds of 10 µs.
 * But because this usually runs on Unix-like OSes, the 10 µs get rounded up to the "tick"
 * (actually almost two ticks) of the system. 1 ms. Thus, the ten thousand attempts, when
 * "something goes wrong" that requires the error message "flash loader run error" would wait
 * for something like 20 seconds before coming up with the error.
 * By increasing the sleep-per-round to the same order-of-magnitude as the tick-rounding that
 * the OS uses, the wait until the error message is reduced to the same order of magnitude
 * as what was intended. -- REW.
 */

// wait until done (reaches breakpoint)
  timeout = time_ms() + 500;
while (time_ms() < timeout) {
      usleep(10000);

      if(stlink_is_core_halted(sl)) {
          timeout = 0;
          break;
      }
  }

if(timeout) {
      ELOG("Flash loader run error\n");
      goto error;
  }

// check written byte count
  stlink_read_reg(sl, 2, &rr);

/*
  * The chunk size for loading is not rounded. The flash loader
  * subtracts the size of the written block (1-8 bytes) from
  * the remaining size each time. A negative value may mean that
  * several bytes garbage have been written due to the unaligned
  * firmware size.
  */
if((int32_t) rr.r[2] > 0 || (int32_t) rr.r[2] < -7) {
      ELOG("Flash loader write error at 0x%08X\n", target + padded_size - rr.r[2]);
      goto error;
  }

return (0);

  error:
      dhcsr = dfsr = cfsr = hfsr = 0;
      stlink_force_debug(sl); // ensure we can read regs, even after timeout
      stlink_read_debug32(sl, STM32_REG_DHCSR, &dhcsr);
      stlink_read_debug32(sl, STM32_REG_DFSR, &dfsr);
      stlink_read_debug32(sl, STM32_REG_CFSR, &cfsr);
      stlink_read_debug32(sl, STM32_REG_HFSR, &hfsr);
      stlink_read_all_regs(sl, &rr);

      WLOG("Loader state: PC 0x%08X\n", rr.r[15]);
      WLOG("              R0 0x%08X R1 0x%08X\n", rr.r[0], rr.r[1]);
      WLOG("              R2 0x%08X R3 0x%08X\n", rr.r[2], rr.r[3]);
      WLOG("              R4 0x%08X R5 0x%08X\n", rr.r[4], rr.r[5]);
      WLOG("              R6 0x%08X R7 0x%08X\n", rr.r[6], rr.r[7]);
      if(dhcsr != 0x3000B || dfsr || cfsr || hfsr) {
          WLOG("MCU state: DHCSR 0x%X DFSR 0x%X CFSR 0x%X HFSR 0x%X\n", dhcsr, dfsr, cfsr, hfsr);
      }

return (-1);
}


/* === Content from old source file flashloader.c === */

#define L1_WRITE_BLOCK_SIZE 0x80
#define L0_WRITE_BLOCK_SIZE 0x40

int32_t stm32l1_write_half_pages(stlink_t *sl, flash_loader_t *fl, stm32_addr_t addr, uint8_t *base, uint32_t len, uint32_t pagesize) {
uint32_t count, off;
uint32_t num_half_pages = len / pagesize;
uint32_t val;
uint32_t flash_regs_base = get_stm32l0_flash_base(sl);
bool use_loader = true;
int32_t ret = 0;

// enable half page write
  stlink_read_debug32(sl, flash_regs_base + FLASH_PECR_OFF, &val);
  val |= (1 << STM32_FLASH_L1_FPRG);
  stlink_write_debug32(sl, flash_regs_base + FLASH_PECR_OFF, val);
  val |= (1 << STM32_FLASH_L1_PROG);
  stlink_write_debug32(sl, flash_regs_base + FLASH_PECR_OFF, val);

  wait_flash_busy(sl);

for(count = 0; count < num_half_pages; count++) {
    if(use_loader) {
      ret = stlink_flash_loader_run(sl, fl, addr + count * pagesize, base + count * pagesize, pagesize);
      if(ret && count == 0) {
        /* It seems that stm32lx devices have a problem when it is blank */
        WLOG("Failed to use flash loader, fallback to soft write\n");
        use_loader = false;
      }
    }
    if(!use_loader) {
      ret = 0;
      for(off = 0; off < pagesize && !ret; off += 64) {
        uint32_t chunk = (pagesize - off > 64) ? 64 : pagesize - off;
        memcpy(sl->q_buf, base + count * pagesize + off, chunk);
        ret = stlink_write_mem32(sl, addr + count * pagesize + off, (uint16_t) chunk);
      }
    }

    if(ret) {
      WLOG("l1_stlink_flash_loader_run(%#x) failed! == -1\n", addr + count * pagesize);
      break;
    }

    if(sl->verbose >= 1) {
      // show progress; writing procedure is slow and previous errors are misleading
      fprintf(stdout, "%3u/%3u halfpages written\n", count + 1, num_half_pages);
      fflush(stdout);
    }

    // wait for sr.busy to be cleared
    wait_flash_busy(sl);
  }

// disable half page write
  stlink_read_debug32(sl, flash_regs_base + FLASH_PECR_OFF, &val);
  val &= ~((1 << STM32_FLASH_L1_FPRG) | (1 << STM32_FLASH_L1_PROG));
  stlink_write_debug32(sl, flash_regs_base + FLASH_PECR_OFF, val);
return (ret);
}

static void set_flash_cr_pg(stlink_t *sl, uint32_t bank) {
uint32_t cr_reg, x;

  x = read_flash_cr(sl, bank);

if(sl->flash_type == STM32_FLASH_TYPE_C0) {
    cr_reg = STM32_FLASH_C0_CR;
    x |= (1 << FLASH_CR_PG);
  } elseif(sl->flash_type == STM32_FLASH_TYPE_F2_F4) {
    cr_reg = STM32_FLASH_F4_CR;
    x |= (1 << FLASH_CR_PG);
  } elseif(sl->flash_type == STM32_FLASH_TYPE_F7) {
    cr_reg = STM32_FLASH_F7_CR;
    x |= (1 << FLASH_CR_PG);
  } elseif(sl->flash_type == STM32_FLASH_TYPE_L4) {
    cr_reg = STM32_FLASH_L4_CR;
    x &= ~STM32_FLASH_L4_CR_OPBITS;
    x |= (1 << STM32_FLASH_L4_CR_PG);
  } elseif(sl->flash_type == STM32_FLASH_TYPE_L5_U5_H5) {
    cr_reg = STM32_FLASH_L5_NSCR;
    x |= (1 << FLASH_CR_PG);
  } elseif(sl->flash_type == STM32_FLASH_TYPE_G0 ||
             sl->flash_type == STM32_FLASH_TYPE_G4) {
    cr_reg = STM32_FLASH_Gx_CR;
    x |= (1 << FLASH_CR_PG);
  } elseif(sl->flash_type == STM32_FLASH_TYPE_WB_WL) {
    cr_reg = STM32_FLASH_WB_CR;
    x |= (1 << FLASH_CR_PG);
  } elseif(sl->flash_type == STM32_FLASH_TYPE_H7) {
    cr_reg = (bank == BANK_1) ? STM32_FLASH_H7_CR1 : STM32_FLASH_H7_CR2;
    x |= (1 << STM32_FLASH_H7_CR_PG);
  } else {
    cr_reg = (bank == BANK_1) ? FLASH_CR : FLASH_CR2;
    x = (1 << FLASH_CR_PG);
  }

  stlink_write_debug32(sl, cr_reg, x);
}

static void set_dma_state(stlink_t *sl, flash_loader_t *fl, int32_t bckpRstr) {
uint32_t rcc, rcc_dma_mask, value;

  rcc = rcc_dma_mask = value = 0;

switch (sl->flash_type) {
case STM32_FLASH_TYPE_C0:
    rcc = STM32C0_RCC_AHBENR;
    rcc_dma_mask = STM32C0_RCC_DMAEN;
    break;
case STM32_FLASH_TYPE_F0_F1_F3:
case STM32_FLASH_TYPE_F1_XL:
    rcc = STM32F1_RCC_AHBENR;
    rcc_dma_mask = STM32F1_RCC_DMAEN;
    break;
case STM32_FLASH_TYPE_F2_F4:
case STM32_FLASH_TYPE_F7:
    rcc = STM32F4_RCC_AHB1ENR;
    rcc_dma_mask = STM32F4_RCC_DMAEN;
    break;
case STM32_FLASH_TYPE_G0:
    rcc = STM32G0_RCC_AHBENR;
    rcc_dma_mask = STM32G0_RCC_DMAEN;
    break;
case STM32_FLASH_TYPE_G4:
case STM32_FLASH_TYPE_L4:
    rcc = STM32G4_RCC_AHB1ENR;
    rcc_dma_mask = STM32G4_RCC_DMAEN;
    break;
case STM32_FLASH_TYPE_L0_L1:
    if(get_stm32l0_flash_base(sl) == STM32_FLASH_Lx_REGS_ADDR) {
      rcc = STM32L1_RCC_AHBENR;
      rcc_dma_mask = STM32L1_RCC_DMAEN;
    } else {
      rcc = STM32L0_RCC_AHBENR;
      rcc_dma_mask = STM32L0_RCC_DMAEN;
    }
    break;
case STM32_FLASH_TYPE_L5_U5_H5:
    rcc = STM32L5_RCC_AHB1ENR;
    rcc_dma_mask = STM32L5_RCC_DMAEN;
    break;
case STM32_FLASH_TYPE_H7:
    rcc = STM32H7_RCC_AHB1ENR;
    rcc_dma_mask = STM32H7_RCC_DMAEN;
    break;
case STM32_FLASH_TYPE_WB_WL:
    rcc = STM32WB_RCC_AHB1ENR;
    rcc_dma_mask = STM32WB_RCC_DMAEN;
    break;
case STM32_FLASH_TYPE_WB0:
    rcc = STM32WB0_RCC_AHBENR;
    rcc_dma_mask = STM32WB0_RCC_AHB_DMAEN;
    break;
default:
    return;
  }

if(!stlink_read_debug32(sl, rcc, &value)) {
    if(bckpRstr) {
      value = (value & (~rcc_dma_mask)) | fl->rcc_dma_bkp;
    } else {
      fl->rcc_dma_bkp = value & rcc_dma_mask;
      value &= ~rcc_dma_mask;
    }
    stlink_write_debug32(sl, rcc, value);
  }
}

int32_t stlink_flashloader_start(stlink_t *sl, flash_loader_t *fl) {
// disable DMA
  set_dma_state(sl, fl, 0);

// wait for ongoing op to finish
  wait_flash_busy(sl);
// Clear errors
  clear_flash_error(sl);

if((sl->flash_type == STM32_FLASH_TYPE_F2_F4) ||
      (sl->flash_type == STM32_FLASH_TYPE_F7) ||
      (sl->flash_type == STM32_FLASH_TYPE_L4)) {
    ILOG("Starting Flash write for F2/F4/F7/L4\n");

    // Flash loader initialisation
    if(stlink_flash_loader_init(sl, fl) == -1) {
      ELOG("stlink_flash_loader_init() == -1\n");
      return (-1);
    }

    unlock_flash_if(sl); // first unlock the cr

    int32_t voltage;
    if(sl->version.stlink_v == 1) {
      WLOG("STLINK V1 cannot read voltage, use default voltage 3.2 V\n");
      voltage = 3200;
    } else {
      voltage = stlink_target_voltage(sl);
    }

    if(voltage == -1) {
      ELOG("Failed to read Target voltage\n");
      return (-1);
    }

    if(sl->flash_type == STM32_FLASH_TYPE_L4) {
      // L4 does not have a byte-write mode
      if(voltage < 1710) {
        ELOG("Target voltage (%d mV) too low for flash writes!\n", voltage);
        return (-1);
      }
    } else {
      if(voltage > 2700) {
        ILOG("enabling 32-bit flash writes\n");
        write_flash_cr_psiz(sl, 2, BANK_1);
      } else {
        ILOG("Target voltage (%d mV) too low for 32-bit flash, using 8-bit flash writes\n", voltage);
        write_flash_cr_psiz(sl, 0, BANK_1);
      }
    }

    // set programming mode
    set_flash_cr_pg(sl, BANK_1);
  } elseif(sl->flash_type == STM32_FLASH_TYPE_WB_WL ||
             sl->flash_type == STM32_FLASH_TYPE_G0 ||
             sl->flash_type == STM32_FLASH_TYPE_G4 ||
             sl->flash_type == STM32_FLASH_TYPE_L5_U5_H5 ||
             sl->flash_type == STM32_FLASH_TYPE_C0) {
    ILOG("Starting Flash write for WB/G0/G4/L5/U5/H5/C0\n");

    unlock_flash_if(sl);         // unlock flash if necessary
    set_flash_cr_pg(sl, BANK_1); // set PG 'allow programming' bit
  } elseif(sl->flash_type == STM32_FLASH_TYPE_L0_L1) {
    ILOG("Starting Flash write for L0\n");

    uint32_t val;
    uint32_t flash_regs_base = get_stm32l0_flash_base(sl);

    // disable pecr protection
    stlink_write_debug32(sl, flash_regs_base + FLASH_PEKEYR_OFF, STM32_FLASH_L0_PEKEY1);
    stlink_write_debug32(sl, flash_regs_base + FLASH_PEKEYR_OFF, STM32_FLASH_L0_PEKEY2);

    // check pecr.pelock is cleared
    stlink_read_debug32(sl, flash_regs_base + FLASH_PECR_OFF, &val);
    if(val & (1 << 0)) {
      ELOG("pecr.pelock not clear\n");
      return (-1);
    }

    // unlock program memory
    stlink_write_debug32(sl, flash_regs_base + FLASH_PRGKEYR_OFF, STM32_FLASH_L0_PRGKEY1);
    stlink_write_debug32(sl, flash_regs_base + FLASH_PRGKEYR_OFF, STM32_FLASH_L0_PRGKEY2);

    // check pecr.prglock is cleared
    stlink_read_debug32(sl, flash_regs_base + FLASH_PECR_OFF, &val);
    if(val & (1 << 1)) {
      ELOG("pecr.prglock not clear\n");
      return (-1);
    }

    /* Flash loader initialisation */
    if(stlink_flash_loader_init(sl, fl) == -1) {
      // L0/L1 have fallback to soft write
      WLOG("stlink_flash_loader_init() == -1\n");
    }
  } elseif(sl->flash_type == STM32_FLASH_TYPE_WB0) {
    ILOG("Starting Flash write for WB0\n");

    if(stlink_flash_loader_init(sl, fl) == -1) {
      ELOG("stlink_flash_loader_init() == -1\n");
      return (-1);
    }
  } elseif((sl->flash_type == STM32_FLASH_TYPE_F0_F1_F3) ||
             (sl->flash_type == STM32_FLASH_TYPE_F1_XL)) {
    ILOG("Starting Flash write for VL/F0/F3/F1_XL\n");

    // flash loader initialisation
    if(stlink_flash_loader_init(sl, fl) == -1) {
      ELOG("stlink_flash_loader_init() == -1\n");
      return (-1);
    }

    // unlock flash
    unlock_flash_if(sl);

    // set programming mode
    set_flash_cr_pg(sl, BANK_1);
    if(sl->flash_type == STM32_FLASH_TYPE_F1_XL) {
      set_flash_cr_pg(sl, BANK_2);
    }
  } elseif(sl->flash_type == STM32_FLASH_TYPE_H7) {
    ILOG("Starting Flash write for H7\n");

    unlock_flash_if(sl);         // unlock the cr
    set_flash_cr_pg(sl, BANK_1); // set programming mode
    if(sl->chip_flags & CHIP_F_HAS_DUAL_BANK) {
      set_flash_cr_pg(sl, BANK_2);
    }
    if(sl->chip_id != STM32_CHIPID_H7Ax) {
      // set parallelism
      write_flash_cr_psiz(sl, 3/* 64 bit */, BANK_1);
      if(sl->chip_flags & CHIP_F_HAS_DUAL_BANK) {
        write_flash_cr_psiz(sl, 3/* 64 bit */, BANK_2);
      }
    }
  } else {
    ELOG("unknown coreid, not sure how to write: %x\n", sl->core_id);
    return (-1);
  }

return (0);
}

int32_t stlink_flashloader_write(stlink_t *sl, flash_loader_t *fl, stm32_addr_t addr, uint8_t *base, uint32_t len) {
uint32_t off;
bool is_exclusively_otp = addr >= sl->otp_base && addr < sl->otp_base + sl->otp_size;
bool is_exclusively_flash = addr >= sl->flash_base && addr < sl->flash_base + sl->flash_size;

if((sl->flash_type == STM32_FLASH_TYPE_F2_F4) ||
      (sl->flash_type == STM32_FLASH_TYPE_F7) ||
      (sl->flash_type == STM32_FLASH_TYPE_L4) ||
      (sl->flash_type == STM32_FLASH_TYPE_WB0 && is_exclusively_flash)) {
    uint32_t buf_size = sl->sram_size - 0x1000;
    buf_size = buf_size > 0x8000 ? 0x8000 : buf_size;
    for(off = 0; off < len;) {
      uint32_t size = len - off > buf_size ? buf_size : len - off;
      if(stlink_flash_loader_run(sl, fl, addr + off, base + off, size) == -1) {
        ELOG("stlink_flash_loader_run(%#x) failed! == -1\n", (addr + off));
        check_flash_error(sl);
        return (-1);
      }

      off += size;
    }
  } elseif(sl->flash_type == STM32_FLASH_TYPE_WB_WL ||
             sl->flash_type == STM32_FLASH_TYPE_G0 ||
             sl->flash_type == STM32_FLASH_TYPE_G4 ||
             sl->flash_type == STM32_FLASH_TYPE_L5_U5_H5 ||
             sl->flash_type == STM32_FLASH_TYPE_C0) {

    if(sl->flash_type == STM32_FLASH_TYPE_L5_U5_H5 && (len % 16)) {
        WLOG("Aligning data size to 16 bytes\n");
        len += 16 - len % 16;
    }
    DLOG("Starting %3u page write\n", len / sl->flash_pgsz);
    for(off = 0; off < len; off += sizeof(uint32_t)) {
      uint32_t data;

      if((off % sl->flash_pgsz) > (sl->flash_pgsz - 5)) {
        fprintf(stdout, "%3u/%-3u pages written\n", (off / sl->flash_pgsz + 1), (len / sl->flash_pgsz));
        fflush(stdout);
      }

      // write_uint32((unsigned char *)&data, *(uint32_t *)(base + off));
      data = 0;
      memcpy(&data, base + off, (len - off) < 4 ? (len - off) : 4);
      stlink_write_debug32(sl, addr + off, data);
      wait_flash_busy(sl); // wait for 'busy' bit in FLASH_SR to clear
    }
    fprintf(stdout, "\n");

    // flash writes happen as 2 words at a time
    if((off / sizeof(uint32_t)) % 2 != 0) {
      stlink_write_debug32(sl, addr + off, 0); // write a single word of zeros
      wait_flash_busy(sl); // wait for 'busy' bit in FLASH_SR to clear
    }
  } elseif(sl->flash_type == STM32_FLASH_TYPE_L0_L1) {
    uint32_t val;
    uint32_t flash_regs_base = get_stm32l0_flash_base(sl);
    uint32_t pagesize = (flash_regs_base == STM32_FLASH_L0_REGS_ADDR)? L0_WRITE_BLOCK_SIZE : L1_WRITE_BLOCK_SIZE;

    DLOG("Starting %3u page write\n", len / sl->flash_pgsz);

    off = 0;

    if(len > pagesize) {
      if(stm32l1_write_half_pages(sl, fl, addr, base, len, pagesize)) {
        return (-1);
      } else {
        off = (uint32_t) ((uint64_t) (len / pagesize) * pagesize);
      }
    }

    // write remaining word in program memory
    for(; off < len; off += sizeof(uint32_t)) {
      uint32_t data;

      if((off % sl->flash_pgsz) > (sl->flash_pgsz - 5)) {
        fprintf(stdout, "%3u/%-3u pages written\n", (off / sl->flash_pgsz + 1), (len / sl->flash_pgsz));
        fflush(stdout);
      }

      write_uint32((unsignedchar *)&data, *(uint32_t *)(base + off));
      stlink_write_debug32(sl, addr + off, data);

      // wait for sr.busy to be cleared
      do {
        stlink_read_debug32(sl, flash_regs_base + FLASH_SR_OFF, &val);
      } while ((val & (1 << 0)) != 0);

      // TODO: check redo write operation
    }
    fprintf(stdout, "\n");
  } elseif((sl->flash_type == STM32_FLASH_TYPE_F0_F1_F3) || (sl->flash_type == STM32_FLASH_TYPE_F1_XL)) {
    int32_t write_block_count = 0;
    for(off = 0; off < len; off += sl->flash_pgsz) {
      // adjust last write size
      uint32_t size = len - off > sl->flash_pgsz ? sl->flash_pgsz : len - off;

      // unlock and set programming mode
      unlock_flash_if(sl);

      DLOG("Finished unlocking flash, running loader!\n");

      if(stlink_flash_loader_run(sl, fl, addr + off, base + off, size) == -1) {
        ELOG("stlink_flash_loader_run(%#x) failed! == -1\n", (addr + off));
        check_flash_error(sl);
        return (-1);
      }

      lock_flash(sl);

      if(sl->verbose >= 1) {
        // show progress; writing procedure is slow and previous errors are
        // misleading
        fprintf(stdout, "%3u/%-3u pages written\n", ++write_block_count,
                (len + sl->flash_pgsz - 1) / sl->flash_pgsz);
        fflush(stdout);
      }
    }
    if(sl->verbose >= 1) {
      fprintf(stdout, "\n");
    }
  } elseif(sl->flash_type == STM32_FLASH_TYPE_H7) {
    for(off = 0; off < len;) {
      // Program STM32H7x with 64-byte Flash words
      uint32_t chunk = (len - off > 64) ? 64 : len - off;
      memcpy(sl->q_buf, base + off, chunk);
      stlink_write_mem32(sl, addr + off, 64);
      wait_flash_busy(sl);

      off += chunk;

      if(sl->verbose >= 1) {
        // show progress
        fprintf(stdout, "%u/%u bytes written\n", off, len);
        fflush(stdout);
      }
    }
    if(sl->verbose >= 1) {
      fprintf(stdout, "\n");
    }
  } elseif((sl->flash_type == STM32_FLASH_TYPE_WB0) && is_exclusively_otp) {
    // WB0 OTP area can not be written with BURSTWRITE as implemented in flashloader
    // Writes are done as 32bit words, out of bounds bytes are written as 0xFF (no change to flash)
    for(off = 0; off < len; ) {
      uint32_t current_addr = addr + off;
      uint32_t remaining = len - off;
      uint32_t word_base = current_addr & (~0x03);
      uint32_t unused_front = current_addr - word_base;
      uint32_t max_bytes = 4 - unused_front;
      uint32_t bytes_to_copy = remaining < max_bytes ? remaining : max_bytes;

      uint32_t data_word = 0xFFFFFFFF;
      memcpy(((uint8_t*)&data_word) + unused_front, base + off, bytes_to_copy);
      
      stlink_write_debug32(sl, STM32_FLASH_WB0_IRQRAW, STM32_FLASH_WB0_IRQ_ALL);
      stlink_write_debug32(sl, STM32_FLASH_WB0_DATA0, data_word);
      stlink_write_debug32(sl, STM32_FLASH_WB0_ADDRESS, (current_addr) >> 2);
      stlink_write_debug32(sl, STM32_FLASH_WB0_COMMAND, STM32_FLASH_WB0_CMD_OTPWRITE);
      wait_flash_busy(sl);
      
      if (check_flash_error(sl)) {
        ELOG("Failed to writing OTP word at 0x%08X!\n", current_addr);
        return (-1);
      }
      off += bytes_to_copy;
    }
  } else {
    return (-1);
  }

return check_flash_error(sl);
}

int32_t stlink_flashloader_stop(stlink_t *sl, flash_loader_t *fl) {
uint32_t dhcsr;

if((sl->flash_type == STM32_FLASH_TYPE_C0) ||
      (sl->flash_type == STM32_FLASH_TYPE_F0_F1_F3) ||
      (sl->flash_type == STM32_FLASH_TYPE_F1_XL) ||
      (sl->flash_type == STM32_FLASH_TYPE_F2_F4) ||
      (sl->flash_type == STM32_FLASH_TYPE_F7) ||
      (sl->flash_type == STM32_FLASH_TYPE_G0) ||
      (sl->flash_type == STM32_FLASH_TYPE_G4) ||
      (sl->flash_type == STM32_FLASH_TYPE_H7) ||
      (sl->flash_type == STM32_FLASH_TYPE_L4) ||
      (sl->flash_type == STM32_FLASH_TYPE_L5_U5_H5) ||
      (sl->flash_type == STM32_FLASH_TYPE_WB_WL)) {

    clear_flash_cr_pg(sl, BANK_1);
    if((sl->flash_type == STM32_FLASH_TYPE_H7 && sl->chip_flags & CHIP_F_HAS_DUAL_BANK) ||
        sl->flash_type == STM32_FLASH_TYPE_F1_XL) {
      clear_flash_cr_pg(sl, BANK_2);
    }
    lock_flash(sl);
  } elseif(sl->flash_type == STM32_FLASH_TYPE_L0_L1) {
    uint32_t val;
    uint32_t flash_regs_base = get_stm32l0_flash_base(sl);

    // reset lock bits
    stlink_read_debug32(sl, flash_regs_base + FLASH_PECR_OFF, &val);
    val |= (1 << 0) | (1 << 1) | (1 << 2);
    stlink_write_debug32(sl, flash_regs_base + FLASH_PECR_OFF, val);
  }

// enable interrupt
if(!stlink_read_debug32(sl, STM32_REG_DHCSR, &dhcsr)) {
    stlink_write_debug32(sl, STM32_REG_DHCSR, STM32_REG_DHCSR_DBGKEY | STM32_REG_DHCSR_C_DEBUGEN |
                         (dhcsr & (~STM32_REG_DHCSR_C_MASKINTS)));
  }

// restore DMA state
  set_dma_state(sl, fl, 1);

return (0);
} 
本文参与 腾讯云自媒体同步曝光计划,分享自微信公众号。
原始发表:2026-02-09,如有侵权请联系 cloudcommunity@tencent.com 删除

本文分享自 云深之无迹 微信公众号,前往查看

如有侵权,请联系 cloudcommunity@tencent.com 删除。

本文参与 腾讯云自媒体同步曝光计划  ,欢迎热爱写作的你一起参与!

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
目录
  • 从 MCU 的执行本质说起:CPU 根本不关心“这是 Flash 还是 RAM”
  • 为什么“默认”代码在 Flash 里跑?
  • 什么时候“必须”在 RAM 里跑代码?
    • Flash 正在被操作(擦 / 写)
    • 对“执行速度”极端敏感
    • 某些外设访问有“总线冲突”
    • 需要“动态加载 / 解压 / 重定位”
  • MCU 是怎么“在 RAM 里跑代码”的?
    • 核心机制只有两个字:
    • 链接阶段:告诉编译器“这段代码属于 RAM”
    • 源码里标记函数
    • 启动时:从 Flash 拷贝到 RAM
  • 是不是“整个程序”都能放 RAM?
  • 在 RAM 里跑代码的限制
    • 不能用 Flash 里的常量/函数
    • 中断向量表的位置
    • cache / MPU / XIP 问题(高端 MCU)
  • 详细分析
    • flash_loader.c 的核心思想
      • 两个关键宏
      • 看门狗 Key Register 地址
    • loader_code_stm32xxx[]:这些字节数组是什么?
    • stlink_flash_loader_init():初始化做了什么?为什么这样做?
      • 先 halt 再 mask interrupts
      • 把 loader 写到 SRAM
      • 选择喂狗地址 + 清 Fault 寄存器
    • loader_v_dependent_assignment():为什么要读目标电压?
    • stlink_flash_loader_write_to_sram():根据芯片挑 loader,再写入 SRAM
    • stlink_flash_loader_run():真正“执行一次写入”的全过程
      • 先把待写数据写到 SRAM 缓冲
      • 处理 F1_XL 的 bank2
      • “设置 core 的寄存器”,然后把 PC 指向 loader
      • 喂狗(可选)
      • Run loader
      • 等待 loader 结束:轮询 halted
      • 用 R2 检查“剩余字节数”
      • 出错时读 DHCSR/DFSR/CFSR/HFSR + 所有寄存器
      • set_dma_state():写 Flash 前先关 DMA,写完再恢复
      • stlink_flashloader_write():真正写数据(分系列实现)
      • stlink_flashloader_stop():收尾
    • 后记
领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档