Remix.run Logo
mustyoshi 3 days ago

Yeah this is the thing people miss a lot. 7,32b models work perfectly fine for a lot of things, and run on previously high end consumer hardware.

But we're still in the hype phase, people will come to their senses once the large model performance starts to plateau

_heimdall 3 days ago | parent | next [-]

I expect people to come to their senses when LLM companies stop subsidizing cost and start charging customers what it actually costs them to train and run these models.

gunalx 3 days ago | parent [-]

I mean, there is no reason for a inference provider og open models to subsidice you. And costs there is usually cheaper than Claude API pricing.

_heimdall 3 days ago | parent [-]

Its still a market though, there is always the incentive to subsidize if all the competition is keeping prices artificially low.

zamadatix 3 days ago | parent | prev | next [-]

People don't want to guess which sized model is right for a task and current systems are neither good or efficient at trying to estimate that automatically. I see only the power users tweaking more and more as performance plateaus and the average user only changing when it's automatic.

bakugo 3 days ago | parent | prev [-]

> 7,32b models work perfectly fine for a lot of things

Like what? People always talk about how amazing it is that they can run models on their own devices, but rarely mention what they actually use them for. For most use cases, small local models will always perform significantly worse than even the most inexpensive cloud models like Gemini Flash.

totaa 3 days ago | parent [-]

Gemma 3n E4B has been crazy good for me - fine tune running on Google Cloud Run via Ollama, completely avoiding token based pricing at the cost of throughput limitations

pigeonhole123 3 days ago | parent [-]

What kind of applications are you using it for?