This is still using a Tomasulo like algorithm, it's just been shifted from the backend to the front end. And instructions don't lock up on an L1 miss. Instead the results of that instruction are marked as poisoned, and the front end replays the their microps forward in the execution stream once the L1 miss is resolved. As the article points out, this replay is likely to fill out otherwise unused execution slots on general purpose code, as OoO cpus rarely sustain their full execution width.

It's a smart idea, and has some parallels to the Mill CPU design. The backend is conceptually similar to a statically scheduled VLIW core, and the front end races ahead using it's matrix scorecard trying to queue up as much as it can for it vs the presence of unpredictable latencies.

▲

quantummagic 3 days ago | parent [-]

> Mill CPU design

There were some fascinating concepts being explored in that project. It's a shame nothing came of it.

	▲	Findecanor 2 days ago \| parent [-]
		Last post on their forum a month ago, they claimed that they were live and having progress, but I dunno ... What I'm afraid of is that perhaps they have been shifting what their goal is a little too often, which of course would delay their time to market. For example, I think they have shifted from straightforward fixed-SIMD to scalable vectors of some sort, and last I heard they were talking about AI .. which usually means that there's some kind of support for matrix multiplication.