Remix.run Logo
immibis 2 days ago

The head of the pipeline is at least several clock cycles ahead of the tail, by definition. At the time the branch instruction reaches the part of the CPU where it decides whether to branch or not, the next several instructions have already been fetched, decoded and partially executed, and that's thrown away on a mispredicted branch.

There may not be a large delay when executing from TCM with a short pipeline, but it's still there. It can be so small that it doesn't justify the expense of a branch predictor. Many microcontrollers are optimized for power consumption, which means simplicity. I expect microcontroller-class chips to largely run in-order with short pipelines and low-ish clock speeds, although there are exceptions. Older generations of microcontrollers (PIC/AVR) weren't even pipelined at all.

addaon 2 days ago | parent [-]

> but it's still there

Unless you evaluate branches in the second stage of the pipeline and forward them. Or add a delay slot and forward them from the third stage. In the typical case you’re of course correct, but there are many approaches out there.