▲ | ramon156 3 hours ago | |
These don't run 200B models at all, results show it can run 13B at best. 70B is ~3 tk / s according to someone on Reddit. | ||
▲ | Borealid 2 hours ago | parent [-] | |
I don't know where you've got those numbers, but they're wrong. https://www.reddit.com/r/LocalLLaMA/comments/1n79udw/inferen... seems comparable to the Framework Desktop and reputable - they didn't just quote a number, they showed benchmark output. I get far more than 3 t/s for a 70B model on normal non-unified RAM, so that's completely unfeasible performance for a unified memory architecture like Halo. |