I'm getting 20 tokens/sec on the 120B model with a 5060Ti 16GB and a regular desktop Ryzen 7800x3d with 64GB of DDR5-6000.

▲

wkat4242 8 days ago | parent [-]

Wow that's not bad. It's strange, for me it is much much slower on a Radeon Pro VII (also 16GB, with a memory bandwidth of 1TB/s!) and a Ryzen 5 5600 with also 64GB. It's basically unworkably slow. Also, I only get 100% CPU when I check ollama ps, the GPU is not being used at all :( It's also counterproductive because the model is just too large for 64GB.

I wonder what makes it work so well on yours! My CPU isn't much slower and my GPU probably faster.

▲

magicalhippo 8 days ago | parent [-]

AMD basically decided they wanted to focus on HPC and data center customers rather than consumers, and so GPGPU driver support for consumer cards has been non-existing or terrible[1].

[1]: https://github.com/ROCm/ROCm/discussions/3893

	▲	wkat4242 6 days ago \| parent [-]
		The Radeon VII Pro is not a consumer card though and works well with ROCm. It even has datacenter "grade" HBM2 memory that most Nvidias don't have. The continuing support has been dropped but ROCm of course still works fine. It's nearly as fast in Ollama as my 4090 (which I don't use for AI regularly but I just play with it sometimes)