It is simply marketing nonsense - what they really mean (I think) is they support matrix multiplication (matmul) at the hardware level which given AI is mostly matrix multiplications you'll get much faster inference (and some increase in training too) on this new hardware. I'm looking forward to seeing how fast a local 96gb+ LLM is on the M5 Max with 128gb of RAM.

▲

manmal 6 hours ago | parent [-]

We've already established in this thread that memory bandwidth isn't that much greater than M4 Max - 12%? However, I wonder if batched inference will benefit greatly from the vastly improved compute. My guess is that parallel usage of the same model will be a couple times faster. So, single "threaded" use not that much better, but say you want to run a lot of batch jobs, it'd be way faster?

	▲	andy_ppp 3 hours ago \| parent [-]
		Is this a reply to a different comment?