| ▲ | baq 13 hours ago | ||||||||||||||||
Serving barely useful GLM 5.2 costs what? $15k? Actually useful is like $50k? You’ll never recoup the cost unless you ‘locally’ means ‘inference provider is not the model provider’? | |||||||||||||||||
| ▲ | adrian_b 7 hours ago | parent | next [-] | ||||||||||||||||
The high costs are necessary for high speed. When a low speed of the order of one token per second is accepted, any open weights LLM can be run on an ordinary PC (with the weights read from SSDs) and the cost becomes negligible. Such a low speed would be annoying for a chat, but I do not believe that it is "barely useful" for a coding assistant. There are plenty of tasks for which it is fine to get results some hours later or even overnight, and batching multiple tasks can complete them in about the same time as a single task. | |||||||||||||||||
| |||||||||||||||||
| ▲ | fractorial 12 hours ago | parent | prev | next [-] | ||||||||||||||||
Not "local" in the literal sense, but I set it up to serve at half quant for $23/hr and full quant for $35/hr. You don't need to have it always on? This is a far cry from "$200/month," but I do not think it's $50k for "useful." Do you see it differently? | |||||||||||||||||
| |||||||||||||||||
| ▲ | dgellow 11 hours ago | parent | prev | next [-] | ||||||||||||||||
Yes they mean open weight models offered by various providers | |||||||||||||||||
| ▲ | verdverm 11 hours ago | parent | prev | next [-] | ||||||||||||||||
$15k or $50k is pretty cheap all things considered (a year ago it would have been more expensive, one person can spend that in a month or two) I bought my spark and the models have already improved in that time (qwen3.6, speculative decoding 2x tgen, diffusion gemma 4x tgen) and I expect this to improve. Look out another 2-3 years, local is going to be very competitive. | |||||||||||||||||
| ▲ | jijji 5 hours ago | parent | prev | next [-] | ||||||||||||||||
glm-5.2 is available for $20/month on ollama.com and is IMHO more functional than the $200/month claude max subscription. you can even use the same claude harness [0]. You get about 20x more token usage at 10x less the price. | |||||||||||||||||
| ▲ | polski-g 12 hours ago | parent | prev [-] | ||||||||||||||||
You can recoup the costs quicker if you resell access to your local LLM on a reselling service. | |||||||||||||||||
| |||||||||||||||||