▲ | k__ 9 days ago | |||||||
Half-OT: Anything useful that runs reasonably fast on a regular Intel CPU/GPU? | ||||||||
▲ | oblio 9 days ago | parent | next [-] | |||||||
I did a bunch of research and basically no. Unless you can work with sending a request in the evening and getting the result in the morning. And you'd need a lot of regular RAM because otherwise you start swapping at which point I think response times end up in days. This tech is in the Wild West days, for it to be usable by the average person on consumer hardware, I think we'll need to be in 2030+. | ||||||||
▲ | ethan_smith 9 days ago | parent | prev [-] | |||||||
For Intel CPUs, Phi-2 (2.7B) and TinyLlama (1.1B) run reasonably well using llama.cpp with 4-bit quantization. GGUF models with INT4 quantization typically need ~2GB RAM per billion parameters, so even older machines can handle smaller models. | ||||||||
|