Remix.run Logo
usernamed7 13 hours ago

Let us hope this only accelerates the proliferation of local models

baq 13 hours ago | parent | next [-]

Serving barely useful GLM 5.2 costs what? $15k? Actually useful is like $50k? You’ll never recoup the cost unless you ‘locally’ means ‘inference provider is not the model provider’?

adrian_b 7 hours ago | parent | next [-]

The high costs are necessary for high speed.

When a low speed of the order of one token per second is accepted, any open weights LLM can be run on an ordinary PC (with the weights read from SSDs) and the cost becomes negligible.

Such a low speed would be annoying for a chat, but I do not believe that it is "barely useful" for a coding assistant. There are plenty of tasks for which it is fine to get results some hours later or even overnight, and batching multiple tasks can complete them in about the same time as a single task.

QuiEgo 7 hours ago | parent [-]

I don't know. Even the frontier models do dumb things sometimes. Being able to iterate (and iterate quickly) is really important. If you get 1 try a day, you're probably back to it being better to just code by hand. Also, you're going to get absolutely outpaced by anyone who uses AI that goes faster.

So maybe for a hobby project this is fine, but for something you have to take to market and compete with... I think it'd be a really rough sell.

EDIT: also, just to be clear: if there was a practical path to using local AI, I'd take it in a heartbeat. I hope it gets to the point that it's better to use local than paying someone $200/mo. But right now, that $200/mo is the clear best option. I get making compromises for ideology but the compromises are too big for me right now.

fractorial 12 hours ago | parent | prev | next [-]

Not "local" in the literal sense, but I set it up to serve at half quant for $23/hr and full quant for $35/hr.

You don't need to have it always on? This is a far cry from "$200/month," but I do not think it's $50k for "useful." Do you see it differently?

dakolli 11 hours ago | parent [-]

This is probably the dumbest possible way to do it. Just buy tokens through open router and you could run it all month 24/7 at 100tps for practically nothing. There are tons of ways to pay for things without giving your personal information.

greenavocado 6 hours ago | parent [-]

  100/s*month*(.14/million) = $37
$37 for the input tokens for Deepseek V4 Flash if you miss cache all the time.

A decent deal but Flash is quite dumb and you still have to pay for output tokens

dgellow 11 hours ago | parent | prev | next [-]

Yes they mean open weight models offered by various providers

verdverm 11 hours ago | parent | prev | next [-]

$15k or $50k is pretty cheap all things considered (a year ago it would have been more expensive, one person can spend that in a month or two)

I bought my spark and the models have already improved in that time (qwen3.6, speculative decoding 2x tgen, diffusion gemma 4x tgen) and I expect this to improve. Look out another 2-3 years, local is going to be very competitive.

jijji 5 hours ago | parent | prev | next [-]

glm-5.2 is available for $20/month on ollama.com and is IMHO more functional than the $200/month claude max subscription. you can even use the same claude harness [0]. You get about 20x more token usage at 10x less the price.

[0] https://ollama.com/library/glm-5.2

polski-g 12 hours ago | parent | prev [-]

You can recoup the costs quicker if you resell access to your local LLM on a reselling service.

baq 11 hours ago | parent [-]

Cheaper to just buy T-bills when I saw the numbers last time

nairboon 13 hours ago | parent | prev | next [-]

It will. Moves like this will only lead to a drift of brains and talents to tweak & tune open harnesses and open models.

forgetfreeman 13 hours ago | parent | prev [-]

There is the undocumented 3rd option of simply shrugging and moving on without LLMs, you know, business as usual.

baq 13 hours ago | parent | next [-]

That ship has sailed. Even if you never even tab complete in cursor, if you don’t let LLMs review your code you’re very, very behind unless you’re in a deeply specialized domain which doesn’t have any public training data available. Anything remotely public and you’re just outpaced.

preg_match 6 hours ago | parent | next [-]

It might just be fine to be outpaced. Software isn’t actually infinite, it has a purpose and does things. If it does the things it needs to do then… great! Maybe you’re done. And maybe you were done 20 years ago.

inigyou 13 hours ago | parent | prev | next [-]

Mythos found one low-severity vulnerability in curl.

forgetfreeman 12 hours ago | parent | prev | next [-]

Is this your first tech industry hype cycle or something?

baq 11 hours ago | parent [-]

No, it’s my experience from the past 6 months

forgetfreeman 8 hours ago | parent [-]

Heh. I vividly remember the hype cycle around self-driving cars. Roll the tape forward a decade or so and combined R&D spend approaches the GDP of a small industrialized country. Untold millions of column inches, close to a decade of hyperventilating FOMO hype mill output. Net result: some cab companies ended up filing for bankruptcy, but really Uber did that.

Crypto bros early claims that blockchain would threaten sovereign nations' ability to collect taxes by ushering in an era of perfect anonymity to financial transactions...

Glassy-eyed consultants convincing basically everyone that introducing electronic devices into classrooms would usher in a new era of human achievement...

As a software engineer it took me a couple more decades than it should to realize that the tech industry, and especially the tech industry in CA, runs entirely on bullshit.

baq 4 hours ago | parent | next [-]

I don’t care about hype cycles too much, I care about the value I, my team and the teams I work with are getting out of the technology and it is objectively revolutionary. I don’t run token ladders, I don’t play stupid status games, I use the tech because it’s a step function change in most workflows. You can call it hype, I’m calling it a dystopian rat race, the name doesn’t matter as long as we both have mouths to feed.

fragmede 4 hours ago | parent | prev [-]

> Net result:

The future is here, but unevenly distributed. Waymo operates in a select few city, but in those cities, you can call a car, that car will have no human driver in it, and the computer will drive you to your destination. Yes it's taken a long time, but if your "evidence" is self driving cars, you might want to address your priors.

nunez 9 hours ago | parent | prev [-]

Not really.

jckahn 13 hours ago | parent | prev | next [-]

That's not the option most are going to take.

forgetfreeman 12 hours ago | parent [-]

shrug Not really a me problem, but I'd counsel taking an afternoon to reflect on what part of any of this is actually inevitable. You know, maybe come up for air for a minute and examine the industry hype from 30,000 ft.

usernamed7 12 hours ago | parent | prev | next [-]

That's a choice you are free to make, just like you're free to shrug and not use the internet or computers.

forgetfreeman 11 hours ago | parent [-]

eyeroll If you truly had the courage of your convictions you would have gone all in here and told me to stop using electricity.

usernamed7 7 hours ago | parent [-]

I haven't told you to do anything, only highlighted that you can choose how to live your life, including not using LLM's.

Believe it or not, some people actually do derive a great deal of value from LLM's and it's also ok if you don't or can't.

forgetfreeman 6 hours ago | parent [-]

"can't"

Still feeling chippy over there I see.

Now would be a pretty good time to define "value". If folks find themselves in a position where statistically averaged word salad or time sunk combing work product for hallucinations equates value that's less an endorsement of the technology than a degradation of the term.

selfhoster11 5 hours ago | parent [-]

Executive dysfunction mitigation. Voice based interfaces. Heavyweight personal file classification with a few hours of prompt building vs labelling a bespoke classifier’s data set and training a more “lightweight” option in weeks or days. Language translation that isn’t DeepL or Google Translate for random websites. They are not deterministic, but the error rate is a lot better on these tasks vs classical approaches.

i2km 13 hours ago | parent | prev [-]

Ridiculous. Haven't you heard? All critical thinking skills have long since been sacrificed on the altars of the AI gods and it's inconceivable that we write any code the old way. If you actually understand your code it means you're a luddite and are going to be left behind. /s