▲ | isoprophlex 3 hours ago | |||||||||||||||||||||||||||||||
Extremely impressive, but can one really run these >200B param models on prem in any cost effective way? Even if you get your hands on cards with 80GB ram, you still need to tie them together in a low-latency high-BW manner. It seems to me that small/medium sized players would still need a third party to get inference going on these frontier-quality models, and we're not in a fully self-owned self-hosted place yet. I'd love to be proven wrong though. | ||||||||||||||||||||||||||||||||
▲ | Borealid 2 hours ago | parent | next [-] | |||||||||||||||||||||||||||||||
A Framework Desktop exposes 96GB of RAM for inference and costs a few thou USD. | ||||||||||||||||||||||||||||||||
| ||||||||||||||||||||||||||||||||
▲ | buyucu 36 minutes ago | parent | prev [-] | |||||||||||||||||||||||||||||||
I'm running them on GMKTec Evo 2. |