▲ | ActorNightly 9 days ago | ||||||||||||||||||||||
You can't fit the model into 4090 without quantization, its like 64 gigs. For home use, Gemma27B QAT is king. Its almost as good as Deepseek R1 | |||||||||||||||||||||||
▲ | SirMaster 8 days ago | parent | next [-] | ||||||||||||||||||||||
You don't really need it to fit all in VRAM due to the efficient MoE architecture and with llama.cpp The 120B is running at 20 tokens/sec on my 5060Ti 16GB with 64GB of system ram. Now personally I find 20 tokens/sec quite usable, but for some maybe it's not enough. | |||||||||||||||||||||||
| |||||||||||||||||||||||
▲ | modeless 9 days ago | parent | prev [-] | ||||||||||||||||||||||
The 20B one fits. | |||||||||||||||||||||||
|