Remix.run Logo
patchnull 2 hours ago

Same experience here. What worked for me was using CubeMX purely for pin and clock config, then dropping down to the LL (low-layer) drivers or direct CMSIS register access for anything in a hot path. The HAL interrupt handlers in particular add a surprising amount of overhead — on a tight DMA transfer loop I measured ~40% cycle waste just from HAL callback dispatch.

The LL API is basically thin inline wrappers around register writes, so you still get the CubeMX-generated init code but without the HAL abstraction tax at runtime.