| ▲ | kennywinker 2 days ago | ||||||||||||||||
Yes, but that comes at the cost of using a dumber llm. The state of the art ones are only available via commercial api, and the best self-hostable models require $10,000+ gpus. This is a problem for coding as smarter really has an impact there, but there are so so so many tasks that an 8b model that runs on a $200 gpu can handle nicely. Scrape this page and dump json? Yeah that’s gonna be fine. This is my conclusion based on a week or so of using ollama + qwen3.5:3b self hosted on a ~10 year old dell optiplex with only the built-in gpu. You don’t need state of the art to do simple tasks. | |||||||||||||||||
| ▲ | gbro3n a day ago | parent | next [-] | ||||||||||||||||
I saw that the Hetzner matrix like has GPU servers < £300 per month (plus set up fee). I haven't tried it but I think if I was getting up to that sort of spend I'd be setting up Ollama on one of those with a larger Qwen3 max model (which I hear is on par with Opus 4.5?? - I haven't been able to try Qwen yet though so that could be b*****ks). | |||||||||||||||||
| |||||||||||||||||
| ▲ | TheDong a day ago | parent | prev [-] | ||||||||||||||||
> Scrape this page and dump json? Yeah that’s gonna be fine. Only gonna be fine on a trusted page, an 8b model can be prompt injected incredibly trivially compared to larger ones. | |||||||||||||||||
| |||||||||||||||||