| ▲ | bigyabai an hour ago | |||||||
LLM inference decode is heavily dependent on memory speed, not just having lots of memory. You can't say "X amount of ram" because the memory bandwidth on an M1 is 68.3 GB/s versus the 614 GB/s of an M5 Max, or a 4090's 1.01 TB/s over GDDR6X. This basically creates a bottleneck at the oldest/cheapest Apple Silicon machines, which are already crippled for context prefill. | ||||||||
| ▲ | h14h an hour ago | parent [-] | |||||||
Thanks for clarifying -- I was oversimplifying. But honestly, obsoleting a huge number of otherwise great Apple Silicon machines is something Apple would moment consider a major "pro" of building a compelling local AI stack. With how much speculation around the difficult time Apple has had getting people to upgrade from M1, I'm sure they'd jump at such an opportunity. | ||||||||
| ||||||||