Remix.run Logo
slopinthebag 4 hours ago

I didn't get the sense that the author is nervous. What I tend to see are people who are nervous that going all-in on LLM workflows might not have the payoff they are expecting, and are becoming increasingly fanatical as a result.

Just one more harness bro. Just one more agentic swarm. Please bro, just one more Claude Max subscription. Please bro.

atleastoptimal 2 hours ago | parent | next [-]

Complaining about every one off issue with LLM's ignores the bigger picture: they are getting better every month and there is no fundamental reason why they wouldn't surpass humans in coding. Everything else is secondary.

All I would need from an LLM doubter is evidence that at tractable software engineering task LLM's are not improving. The strongest argument against the increasing general capabilities of LLM's are the ARC-AGI tasks, however the creators admit that each generation of LLM's exceed their expectations, and that AGI will be achieved within the decade.

wavemode an hour ago | parent [-]

Your logic is flawed because, a thing can improve for an infinite amount of time while never surpassing a certain limit. It's called an asymptote.

That being said, I don't even think that arguing about this from a mathematical perspective is a worthwhile use of time. Calling something an asymptote in the first place requires defining a quantifiable "X" and "Y", which we don't even have. What we have are a bunch of synthetic benchmarks. Even ignoring the fact that the answers to the questions are known to regularly leak into the training data (in other words, it's possible for scores to increase while capabilities remain the same), there's also the fundamental fact that performance on benchmarks is not the same thing as performance in the real world. And being able to answer some arbitrary set of arbitrary questions on a benchmark which the previous model couldn't, does not have a quantifiable correlation to some specific amount of real-world improvement.

The OP article focuses on research papers which assess real-world impact of LLMs within software organizations, which I think are more representative.

I wouldn't call myself an "AI doubter" - I use LLMs every day. When you say "doubter" you're not referring to "AI" in general, or the fact that AI is helpful or boosts productivity (which I believe it does). You're rather referring to the very specific, very extraordinary claim, that LLMs will surpass humans in coding. If that's the case then yeah I'm a doubter, at least on any foreseeable timescale.

aspenmartin 4 hours ago | parent | prev [-]

You say this as though performance has not followed a very clear and extremely rapid improvement in a startlingly short amount of time.

You’re definitely right that people adopt agentic workflows and are disappointed or worse, but the point is the disappointment has already reduced substantially and will continue to do so. We know this because we know the scaling laws, and also because learning theory has been around for many decades.

strange_quark 2 hours ago | parent | next [-]

What rapid improvement has occurred, because in this six month AI coding fever dream we've been living in, I really haven't seen anything new in awhile, both in terms of new ideas for AI coding or in new consumer products or services.

I'll give you the coding harnesses themselves are better because that was a new product category with a lot of low-hanging fruit, but have the models actually improved in a way that isn't just benchmaxxing? I'd argue the models seem to be regressing. Even the most AI-pilled people at my company have all complained that Opus 4.7 is a dud. Anecdotally, GPT 5.5 seems decent, but it's rumored to be a 10T parameter model, isn't noticeably better than 5.4 or 5.3, is insanely expensive to use, and seems to be experiencing model collapse since the system prompt has to beg the thing to not talk about goblins and raccoons.

jatora 2 hours ago | parent [-]

Uninformed opinion of someone who clearly doesnt consistently use AI coding tools, clearly. And why are you limiting it to 6 months? Whats wrong with you?

zapataband1 23 minutes ago | parent [-]

How many years of real-life, in-production problem solving/coding have you done? That's what I base how informed you are not how much you use your favorite new $100/month token-prediction subscription

cyclopeanutopia 3 hours ago | parent | prev | next [-]

Perhaps you are confusing performance with instability?

paganel 3 hours ago | parent | prev | next [-]

> very clear and extremely rapid improvement in a startlingly short amount of time.

We're almost 6 months into all this AI-code madness and I've yet to see that "rapid improvement" you mention. As in software products that are genuinely better compared to 6 months ago, or new software products (and good software products at that) which would have not existed had this AI craze not happened.

leptons an hour ago | parent | prev | next [-]

You say this as though AI company debt has not followed a very clear and extremely rapid ballooning in a startlingly short amount of time.

It's the "YOLO" of business strategies.

slopinthebag 3 hours ago | parent | prev [-]

Yes but we don't know the shape of the curve and where we are on it.