Remix.run Logo
binary0010 4 hours ago

So how do openai and anthropic plan to keep customers when GLM-5.1 is just as good and open source and a lot cheaper?

I don't see the business model working. My closest friend actually does automation software for large companies.

He does not use Claude or openai at all. He primarily uses gpt 120b on cerebras and glm-5.1 for heavy thinking work. And some other small models for various tasks. All open source.

And these systems are extremely useful for the businesses and are able to run fully automated pipelines that are very stable and fast.

We discuss this a lot, and we both think any business doing heavy agentic work on Claude and openai just aren't aware of exactly how good and cheap open source has gotten on the last year.

So... once the legacy businesses and developers catch up, won't Claude and openai be unable to recoup their costs?

doug_durham an hour ago | parent | next [-]

GLM-5.1 isn't just as good. It is no match for Opus running in Claude Code. Please try it yourself. Open source models are about a year behind at least.

osti an hour ago | parent [-]

For coding I wouldn't say a year, last year this time claude or gpt definitely weren't able to do what GLM is able to do today, but easily 6 months I'd say.

Not sure about other domains though.

peder 3 hours ago | parent | prev | next [-]

> I don't see the business model working.

Same. It's a nightmare from a Porter's Five Forces perspective.

There will be a ton of businesses competing in this space, and there will be something of a moat due to how capital intensive the business can be, but there will still basically be infinite competitors.

Great for consumers.

ex-aws-dude an hour ago | parent [-]

Well in reality AWS will just host one of them and most companies will use that

Like how snapchat kind of fell off because the feature could just be a subset of instagram

It seems like it would just become a commodity like EC2

smokel 4 hours ago | parent | prev | next [-]

For coding assistance, I have tried OpenCode with several large open models through OpenRouter. All were fairly bad compared to Claude Opus. Could you provide some hints on how I should be holding these open models so that I might get more value out of them?

I agree with the common trope that open models lag behind by about a year, but something magical happened just around a year ago when the state of the art models became extremely useful. By this reasoning we're about to see open models perform well, but I'm afraid there is more to it than just waiting for another revolution around the sun.

Note, my application is coding assistance. Open models can be great for other purposes.

tariky 2 hours ago | parent | next [-]

I tried almost all OS models on opencode, none of them is on levels as opus 4.7.

In latest experiment I used opus for implementation plan then used cursor composer 2.5 for execution.

I must say that combo is really good. Main drawback of claude code is that is super slow. So when paired with composer that is super fast it flies.

cainxinth 2 hours ago | parent [-]

No one is claiming that OS is as good. They are saying it isn't that far behind SOTA commercial products. So why pay exorbitantly just to get something only a few percent better than the free option?

But there have been very good open source office apps for decades and few enterprises use them, so perhaps this is just the nature of B2B purchasing committees and 'nobody getting fired for buying IBM.'

slopinthebag 2 hours ago | parent | prev [-]

Do more planning yourself, be smart about the context, break down tasks into smaller components, give it more guidance. You can't just lazily prompt it to complete large features autonomously and expect good results.

aniceperson 19 minutes ago | parent | next [-]

a good harness is supposed to do what you are describing. sonnet on pi.dev is pretty terrible but fast. Claude Code has ridiculous amounts of prompt engineering at system prompt level and sub session spawing combined with low temperature, to provide the predictable results people like. CC screws up and you never see, because the harness auto corrects, while on OSS you see everything, and does not comes with the level of monitoring by default.

amilios 2 hours ago | parent | prev | next [-]

But if the closed-source models can do this without the additional effort, that's a significant gap, no?

10000truths 2 hours ago | parent | next [-]

The point is that the price gap is so much larger than the capability gap, that even with the extra compute needed to make up for the lack of capability, you can still come out ahead in terms of amortized $/work done.

flexagoon an hour ago | parent | prev | next [-]

Is it really when they are hundreds of times more expensive?

eikenberry an hour ago | parent | prev | next [-]

That is the 3-6 month sota-open gap people talk about, a time-window that continues to move as new models are released on both sides.

bigfishrunning 2 hours ago | parent | prev [-]

See that's the thing, they can't. Every model needs hand holding and guidance.

amilios an hour ago | parent [-]

some require less hand-holding than others though

eikenberry 2 hours ago | parent | prev [-]

+1 .. just wanted to reiterate that this is the answer. The open models work great if you just do a little more of the design/architectural work up front and organize your work appropriately.

mesmertech 4 hours ago | parent | prev | next [-]

For coding you always want to go with the best model in the category, not something that would be the best model if we went 1 year back which GLM 5.1 is, and I'm saying that as a big fan of GLM cause I run a translation site where GLM is good enough for the price.

Most of the money right now is in coding. Openai and Anthropic just have to be 6 months ahead of SOTA open source models and they'll capture most of the enterprise and dev market

binary0010 4 hours ago | parent | next [-]

Yes I'm an engineer (20 years most in games/graphics industry) and only use it for code. I've been using glm 5.1 this week a lot. I went in expecting another "decent" but not really "up to standard" open source model.

I highly doubt I'll ever use Claude again.

I think you are wrong about Claude being any significant level better

cassianoleal 3 hours ago | parent [-]

I've been mostly coding with GLM-5.1 as well and I agree with you. DeepSeek V4 Flash is another very good surprise. Incredibly cheap, fast and effective.

odie5533 2 hours ago | parent | prev | next [-]

If I generate code with Claude, ChatGPT, and GLM 5.1, I can't say which model is which reliably. I exclusively use Claude more out of superstition than reason.

eikenberry an hour ago | parent | prev | next [-]

> For coding you always want to go with the best model in the category [..]

And this is why many companies go out of business. You always want the best bang for your buck, sometimes this is the "best model" and sometimes it is not.

kgwgk 4 hours ago | parent | prev | next [-]

For coding like for everything else in life cost is a factor.

mesmertech 3 hours ago | parent [-]

Cost for the value delivered. Like if you offered the current SOTA open source models at $0.1/M, I still think I'd be using Opus or 5.5 at $30/M. Or say GPT 5 which was released Aug 25, I don't think I'd use it for coding for even $0.1. I'd def find other uses for it(translations, agentic workflows, prompt guards etc), but for coding I don't think I'd ever completely switch to a SOTA open model

Unless ofc there was an actual speed difference, only reason I'd be willing to go with a worse model couple of percent worse than current best model is if the speed was at least 5x higher. Looking forward to kimi k2.6 offered publicly by Cerebras

kgwgk 3 hours ago | parent [-]

> I still think I'd be using

That's fine. Other people may not want to pay 300x more and will rather make do with last year's SOTA.

> For coding you always want to go with the best model

Maybe you meant "For coding I always want to go with the best model"?

Andrex 2 hours ago | parent | prev | next [-]

> For coding you always want to go with the best model in the category

Will this always be true? There will never be an event horizon/point of diminishing returns where something not-bleeding-edge is "good enough" for 51%+ of users?

blackjack_ 2 hours ago | parent | prev | next [-]

This is a silly take. There is a line of "good enough" for most coding (most CRUD apps and APIs are nothing special), and once we are past that, nobody will care about having the "newest, best" model except extreme outliers. And this base "good enough" model will become an ultra cheap commodity as we already see with GLM, deepseek, etc.

dogleash 2 hours ago | parent | prev | next [-]

> For XXX you always want to go with XXX, not XXX

Oh, hey, I recognize you. Thank you for the very forward and thorough orbital sander recommendation at Home Depot. That's exactly what I wanted to deal with on my holiday weekend. You just know so much about this and the rest of us are simple passersbys.

EGreg 4 hours ago | parent | prev [-]

Most work is not coding.

And also, people have it wrong… their models are not the main problem anymore. It’s the RAG

tomrod 2 hours ago | parent | next [-]

Would love to hear more about your thought about the RAG.

simonw 2 hours ago | parent [-]

I think RAG is a mostly outdated concept now, it's been subsumed by the idea of a "agent harness" which is exactly what Claude Code and Claude Cowork and OpenAI Codex and Claude.ai and ChatGPT themselves have now become.

An agent harness with access to a good search tool is a much more interesting thing than 2024-era RAG systems.

obsidianbases1 4 hours ago | parent | prev [-]

Depending on RAG is a workflow problem, not an AI problem

e2e4 an hour ago | parent | prev | next [-]

Agree. Also reasonix with deepseek is super cheap and quality is only slightly worse (in my experience)

IAmGraydon an hour ago | parent | prev | next [-]

The only way I see it working out for them is if some legislation is passed that eliminates the competition by making it illegal to run local models. They could claim that the models are dangerous and could be weaponized without oversight, or something along those lines.

csomar an hour ago | parent | prev [-]

They are both (and also spacex) sprinting for IPOs. They know that the opportunity window is closing fast and that advancement in model quality has largely plateaued in the last year. Take as much investor money as you can get away with for now.