| ▲ | PlatoIsADisease 2 hours ago | |||||||
You might want to clarify that this is more of a "Look it technically works" Not a "I actually use this" The difference between waiting 20 minutes to answer the prompt '1+1=' and actually using it for something useful is massive here. I wonder where this idea of running AI on CPU comes from. Was it Apple astroturfing? Was it Apple fanboys? I don't see people wasting time on non-Apple CPUs. (Although, I did do this for a 7B model) | ||||||||
| ▲ | mholm an hour ago | parent | next [-] | |||||||
The reason Macs get recommended is the unified memory, which is usable as VRAM for the GPU. People are similarly using the AMD Strix Halo for AI which also has a similar memory architecture. Time to first token for something like '1+1=' would be seconds, and then you'd be getting ~20 tokens per second, which is absolutely plenty fast for regular use. Token/s slows down at the higher end of context, but it's absolutely still practical for a lot of usecases. Though I agree that agentic coding, especially over large projects, would likely get too slow to be practical. | ||||||||
| ||||||||
| ▲ | tucnak 2 hours ago | parent | prev [-] | |||||||
Mac studio way is not "AI on CPU," as M2/M4 are complex SoC, that includes a GPU with unified memory access. | ||||||||