Remix.run Logo
kennywinker 2 days ago

Yes, but that comes at the cost of using a dumber llm. The state of the art ones are only available via commercial api, and the best self-hostable models require $10,000+ gpus.

This is a problem for coding as smarter really has an impact there, but there are so so so many tasks that an 8b model that runs on a $200 gpu can handle nicely. Scrape this page and dump json? Yeah that’s gonna be fine.

This is my conclusion based on a week or so of using ollama + qwen3.5:3b self hosted on a ~10 year old dell optiplex with only the built-in gpu. You don’t need state of the art to do simple tasks.

gbro3n a day ago | parent | next [-]

I saw that the Hetzner matrix like has GPU servers < £300 per month (plus set up fee). I haven't tried it but I think if I was getting up to that sort of spend I'd be setting up Ollama on one of those with a larger Qwen3 max model (which I hear is on par with Opus 4.5?? - I haven't been able to try Qwen yet though so that could be b*****ks).

hhh a day ago | parent [-]

I have tried most of the major open source models now and they all feel okay, but i’d prefer Sonnet or something any day over them. Not even close in capability for general tasks in my experience.

TheDong a day ago | parent | prev [-]

> Scrape this page and dump json? Yeah that’s gonna be fine.

Only gonna be fine on a trusted page, an 8b model can be prompt injected incredibly trivially compared to larger ones.

kennywinker a day ago | parent [-]

Relying on the model to protect you seems like a bad idea…

TheDong a day ago | parent [-]

I mean, clawbots are inherently insecure. Using a better model is defense in depth.

Obviously you should also take precautions, like never instructing it to invoke the browser tool on untrusted sites, avoiding feeding it untrusted inputs where possible in other places, giving it dedicated and locked-down credentials where possible....

But yeah, at this point it's inherent to LLMs that we cannot do something like SQL prepared statements where "tainted" strings are isolated. There is no perfect solution, but using the best model we can is at least a good precaution to stack on top of all our other half-measures.