▲ | SirMaster 8 days ago | ||||||||||||||||
I'm getting 20 tokens/sec on the 120B model with a 5060Ti 16GB and a regular desktop Ryzen 7800x3d with 64GB of DDR5-6000. | |||||||||||||||||
▲ | wkat4242 8 days ago | parent [-] | ||||||||||||||||
Wow that's not bad. It's strange, for me it is much much slower on a Radeon Pro VII (also 16GB, with a memory bandwidth of 1TB/s!) and a Ryzen 5 5600 with also 64GB. It's basically unworkably slow. Also, I only get 100% CPU when I check ollama ps, the GPU is not being used at all :( It's also counterproductive because the model is just too large for 64GB. I wonder what makes it work so well on yours! My CPU isn't much slower and my GPU probably faster. | |||||||||||||||||
|