Another day, another hn thread of "this model changes everything" followed immediately by a reply stating "actually I have the literal opposite experience and find competitor's model is the best" repeated until it's time to start the next day's thread.

▲

StephenHerlihyy 3 hours ago | parent | next [-]

What amazes me the most is the speed at which things are advancing. Go back a year or even a year before that and all these incremental improvements have compounded. Things that used to require real effort to consistently solve, either with RAGs, context/prompt engineering, have become… trivial. I totally agree with your point that each step along the way doesn’t necessarily change that much. But in the aggregate it’s sort of insane how fast everything is moving.

	▲	Rudybega an hour ago \| parent [-]
		The denial of this overall trend on here and in other internet spaces is starting to really bother me. People need to have sober conversations about the speed of this increase and what kind of effects it's going to have on the world.

▲

clhodapp 3 hours ago | parent | prev | next [-]

And of course the benchmarks are from the school of "It's better to have a bad metric than no metric", so there really isn't any way to falsify anyone's opinions...

▲

SatvikBeri 3 hours ago | parent | prev | next [-]

I use Claude Code every day, and I'm not certain I could tell the difference between Opus 4.5 and Opus 4.0 if you gave me a blind test

▲

malshe 4 hours ago | parent | prev | next [-]

This pretty accurately summarizes all the long discussions about AI models on HN.

▲

cactusplant7374 3 hours ago | parent | prev | next [-]

Hourly occurrence on /r/codex. Model astrology is about the vibes.

▲

wasmainiac 4 hours ago | parent | prev [-]

[flagged]

▲

nocman 4 hours ago | parent | next [-]

> Who are making these claims? script kiddies? sr devs? Altman?

AI agents, perhaps? :-D

▲

locknitpicker 4 hours ago | parent | prev | next [-]

> All anonymous as well. Who are making these claims? script kiddies? sr devs? Altman?

You can take off your tinfoil hat. The same models can perform differently depending on the programming language, frameworks and libraries employed, and even project. Also, context does matter, and a model's output greatly varies depending on your prompt history.

	▲	andrepd 3 hours ago \| parent [-]
		It's hardly tinfoil to understand that companies riding a multi-trillion dollar funding wave would spend a few pennies astroturfing their shit on hn. Or overfit to benchmarks that people take as objective measurements.

▲

BoredPositron 4 hours ago | parent | prev [-]

When you keep his ramblings on twitter or company blog in mind I bet he is a shit poster here.