▲ | lelanthran 5 days ago | |
> On r/localllama there is someone that got 120B OSS running on 8gb ram and 35 tokens/sec from the CPU (!!) after noticing 120B has a different architecture of only 5B “active” parameters If anyone else was as interested as I was, here's the link: https://www.reddit.com/r/LocalLLaMA/comments/1mke7ef/120b_ru... |