| ▲ | herodoturtle 10 hours ago | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
I’m curious (and please forgive my ignorance if it’s obvious), are open weight models practically feasible? I mean from a financial and sustainability standpoint, assuming they’re equally powerful as their proprietary counterparts. I guess I’m trying to understand the economics of it. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | SimianSci 9 hours ago | parent | next [-] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
There is an understandable gap between the capabilities of closed models and those of open models. The current difference is primarily expressed in the cost of hardware necessary to sufficiently run a exactly comparable model. A single higher end graphics card running on your average gaming computer, is capable of running small to medium models that compare with those of their lab-born counterparts in the small-medium range. But the heavyweight models are still outside the realm of possibility for all but the most well-funded individual. However, I would highly suggest more people experiment with these smaller models. They are incredibly capable in many ways that many people dont realize. The perceived capabilities of the larger models are also much less the result of the model having more parameters/training cycles, but rather that they are being run through well-made harnesses, something which the open-source community is rapidly approaching with near-peer solutions of their own. In short, much of the gap between between open-weight models and the larger proprietary models can be considered more of an issue of perception and not an issue of capability. There is a fundamental gap economically, but not so much in capability. The open source community is rapidly closing the gap on these larger labs, especially thanks to the amazing research being freely given openly by well funded chinese labs. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | anigbrowl 9 hours ago | parent | prev | next [-] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sort of. A full trillion-parameter model needs about $300k of server hardware to run in and a lot of electricity, making it feasible only for very wealthy individuals, but quite practical for businesses and institutions above a certain size...although they in turn would typically gatekeep access. You can drastically reduce the requirements by running models at a lower bitrate, which somewhat reduces accuracy but not that much - think of the difference between an MP3 vs uncompressed audio. With this and other tricks, you can get high end models down to a size where they can be run on a high spec desktop workstation affordable by an individual or small business. Obviously I'm heavily oversimplifying here. I think a useful parallel is to consider situations from the past where you would once have required corporate budgets equivalent to the price of a house to run a large database, but over time it became accessible to anyone with the requisite expertise and relatively affordable hardware. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | roadside_picnic 9 hours ago | parent | prev | next [-] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
See my comment to parent. I've been using local LLMs for practical, personal tasks for a few months now very successfuly. You can run fantastic local models if you have either: - M-series Apple device with ideally >= 24GB of VRAM - RTX [345]090 GPU I'm fortunate enough to have both and use an M-series laptop as basically a persistent server (I don't use it much and when traveling typically just use my work laptop). My desktop doesn't act as a persitent server but I fire up llama.cpp on it all time for quick chat sessions. If you have one of the above devices and can dedicate it as server there are additional layers of tooling you can use that dramatically improve the experience. In particular Open WebUI allows you to add tons of useful tools (image gen, web search, code eval, etc), and agent harnesses like Hermes can make the current gen small models very capable. I have an agent in chat on my phone that basically handles all the sys-admin for the server it runs on. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | KronisLV 8 hours ago | parent | prev | next [-] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
> I mean from a financial and sustainability standpoint, assuming they’re equally powerful as their proprietary counterparts. Presently they trail SOTA by about 6-12 months, not on par (average across everything they do). DeepSeek V4 Pro with Max reasoning is very affordable even if you pay per-token, this month I pushed about 486 million tokens through it (I will admit that >95% was cache hits, for agentic development pretty typical) and it cost me about 8 USD in total. Meanwhile with Opus or even Sonnet if I had to pay API prices, I would be a more sad camper. That model makes a lot of stupid things though, so not ideal. Meanwhile GLM-5.2 that came out is also quote capable and is near Opus in many tasks, all while their coding plan is more cost effective than Anthropic's: https://z.ai/subscribe I will still stick with Anthropic but consider downgrading from Max 5x to Pro which will change the monthly expenses from around 108 EUR down to <20 EUR (they have a discount too if you pay for a year up front), and probably get the yearly GLM Pro plan which should decrease my yearly expenses from around 1300 EUR total to about 750 total EUR while still giving me a fairly decent setup. For the consumer, that is doable and practical. For the people actually running these models, who knows - at least DeepSeek and others are trying to make the models more efficient so the numbers are more feasible. Also have run Qwen3.6 35B A3B on prem and it kinda sucks. Way better than models that size a year ago, but still lags behind Sonnet and also DeepSeek V4 Flash due to the size limits. Plus to even run myself I'd need a pretty beefy setup, most likely a pair of Intel Arc Pro B70s with 32 GB of VRAM each that I could still run off of my PSU but the actual model output would be kinda bullshit and I'd have to spend an unpleasant amount of time fixing it. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | hatthew 9 hours ago | parent | prev | next [-] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
I'm also curious, specifically about the cost of training vs inference, and comparing that to other industries that can have high R&D costs. My instinct says that open weights aren't feasible because of the obvious issue where there is no incentive to develop your own model rather than just taking someone else's model. However, I could see a scenario where a hardware company designs a model that is open weights but optimized strongly for their own proprietary hardware, cutting their costs of inference low enough to be competitive with a hypothetical other company that doesn't have any R&D expediture. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | sosodev 9 hours ago | parent | prev | next [-] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
It depends entirely on what you want to do and think is feasible. Small models can almost certainly run on the computer that you already have. They can do good tool calling. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | epolanski 9 hours ago | parent | prev | next [-] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Yes they are you can use Qwen, DS4 Pro and GLM 5.2 if you have the hardware to do so. They are not SOTA in various ways but they have better economics. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | waffletower 9 hours ago | parent | prev | next [-] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
If attractive, cloud providers could develop open models with their own investment, and sell hosted access as a business model. While Google checks these boxes, I haven't seen a Google much marketing focus upon their open models (Gemma) coupled with hosting. groq could conceivably train its own models, but groq's business model hosts open models (GPT OSS, Qwen 3, Llama 4 are currently their prominently advertised models on their site... which seems out of date to me) trained by other organizations. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | andrewstuart2 9 hours ago | parent | prev [-] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
I hope/wonder if it will go the way computers did. We may learn to more effectively build RAM or parallel compute, and use it more effectively, in the coming decade in such a way that we can democratize more and more like we did with processors to the point that they're ubiquitous. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||