| ▲ | Curiositry 4 hours ago | |||||||
Qwen3.5 9b seems to be fairly competent at OCR and text formatting cleanup running in llama.cpp on CPU, albeit slow. However, I have compiled it umpteen ways and still haven't gotten GPU offloading working properly (which I had with Ollama), on an old 1650 Ti with 4GB VRAM (it tries to allocate too much memory). | ||||||||
| ▲ | acters 3 hours ago | parent | next [-] | |||||||
I have a 1660ti and the cachyos + aur/llama.cpp-cuda package is working fine for me. With about 5.3 GB of usable memory, I find that the 35B model is by far the most capable one that performs just as fast as the 4B model that fits entirely on my GPU. I did try the 9B model and was surprisingly capable. However 35B still better in some of my own anecdotal test cases. Very happy with the improvement. However, I notice that qwen 3.5 is about half the speed of qwen 3 | ||||||||
| ▲ | WhyNotHugo 2 hours ago | parent | prev [-] | |||||||
If you’re building from source, the vulkan backend is the easiest to build and use for GPU offloading. | ||||||||
| ||||||||