Remix.run Logo
whizzter 3 days ago

It's the gaming/HPC focus, sure you can achieve some stunning benchmark numbers with nice vectorized straightforward code.

In the real world we have our computers running JIT'ed JS, Java or similar code taking up our cpu time, tons of small branches (mostly taken the same way and easily remembered by the branch predictor) and scattering reads/writes all over memory.

Transistors not spent on larger branch prediction caches or L1 caches are badly spent, doesn't matter if the CPU can issue a few less instructions per clock to ace an benchmark if it's waiting for branch mispredictions or cache misses most of the time.

There's no coincidence that the Apple teams iirc are partly the same people that built Pentium-M (that begun the Core era by delivering very good perf on mobile chips when P4 was supposed to be the flagship).