| ▲ | zargon an hour ago | |
Yes, definitely it's the bottleneck for most use cases besides "chatting". It's the reason I have never bought a Mac for LLM purposes. It's frustrating when trying to find benchmarks because almost everyone gives decode speed without mentioning prefill speed. | ||