Remix.run Logo
ramon156 3 hours ago

These don't run 200B models at all, results show it can run 13B at best. 70B is ~3 tk / s according to someone on Reddit.

Borealid 2 hours ago | parent [-]

I don't know where you've got those numbers, but they're wrong.

https://www.reddit.com/r/LocalLLaMA/comments/1n79udw/inferen... seems comparable to the Framework Desktop and reputable - they didn't just quote a number, they showed benchmark output.

I get far more than 3 t/s for a 70B model on normal non-unified RAM, so that's completely unfeasible performance for a unified memory architecture like Halo.