Remix.run Logo
thegrim33 4 days ago

To play devil's advocate, how is your argument not a 'no true scottsman' argument? As in, "oh, they had a negative view of X, well that's of course because they weren't testing the new and improved X2 model which is different". Fast forward a year .. "Oh, they have a negative view on X2, well silly them, they need to be using the Y24 model, that's where it's at, the X2 model isn't good anymore". Fast forward a year .. ad infinitum.

Are the models that exist today a "true scottsman" for you?

xwowsersx 4 days ago | parent | next [-]

It's not a No True Scotsman. That fallacy redefines the group to dismiss counterexamples. The point here is different: when the thing itself keeps changing, evidence from older versions naturally goes stale. Criticisms of GPT-3.5 don't necessarily hold against GPT-4, just like reviews of Windows XP don't apply to Windows 11.

cmiles74 4 days ago | parent | next [-]

IMHO, by placing people with a negative attitude toward AI products under the guise "their priors are outdated" you effectively negate any arguments from those people. That is, because their priors are outdated their counterexamples may be dismissed. That is, indeed, the no true Scotsman!

ludwik 4 days ago | parent [-]

I don’t see a claim that anyone with a negative attitude toward AI shouldn’t be listened to because it automatically means that they formed their opinion on older models. The claim was simply that there’s a large cohort of people who undervalue the capabilities of language models because they formed their views while evaluating earlier versions.

gmm1990 4 days ago | parent | next [-]

I wouldn’t think gpt5 is any better than the previous chat gpt. I know it’s a silly example but I was trying to trip it up with the 8.6-8.11 and it got it right .49 but then it said the opposite of 8.6 - 8.12 was -.21.

I just don’t see that much of a difference coding either with Claude 4 or Gemini 2.5 pro. Like they’re all fine but the difference isn’t changing anything in what I use them for. Maybe people are having more success with the agent stuff but in my mind it’s not that different than just forking a GitHub repo that already does what you’re “building” with the agent.

barrell 4 days ago | parent | prev [-]

Yes but almost definitionally that is everyone who did not find value from LLMs. If you don’t find value from LLMs, you’re not going to use them all the time.

The only people you’re excluding are the people who are forced to use it, and the random sampling of people who happened to try it recently.

So it may have been accidental or indirectly, but yes, no true Scotsman would apply to your statement.

crote 4 days ago | parent | prev [-]

> The point here is different: when the thing itself keeps changing, evidence from older versions naturally goes stale.

Yes, but the claims do not. When the hypemen were shouting that GPT-3 was near-AGI, it still turned out to be absolute shit. When the hypemen were claiming that GPT-3.5 was thousands of times better than GPT-3 and beating all highschool students, it turned out to be a massive exaggeration. When the hypemen claimed that GPT-4 was a groundbreaking innovation and going to replace every single programmer, it still wasn't any good.

Sure, AI is improving. Nobody is doubting that. But you can only claim to have a magical unicorn so many times before people stop believing that this time you might have something different than a horse with an ice cream cone glued to its head. I'm not going to waste a significant amount of my time evaluating Unicorn 5.0 when I already know I'll almost certainly end up disappointed.

Perhaps it'll be something impressive in a decade or two, but in the meantime the fact that Big Tech keeps trying to shove it down my throat even when it clearly isn't ready yet is a pretty good indicator to me that it is still primarily just a hype bubble.

trinsic2 4 days ago | parent [-]

Its funny how the hype-train is not responding to any real criticisms about the false predictions and carrying on with the false narrative of AI.

I agree it will probably be something in a decade, but right now, it has some interesting concepts but I do notice upon successive iterations of chat responses that its got a ways to go.

It remind me of Tesla car owners buying into the self-driving terminology. Yes the drive assistant technology has improved quite a bit since cruise control, but its a far cry from self-driving.

vlovich123 4 days ago | parent | prev [-]

How is that different than the models today are actually usable for non trivial things and more capable than yesterdays and it’s also true that tomorrow’s models will also probably be more capable than today’s?

For example, I dismissed AI three years ago because it couldn’t do anything I needed it to. Today I use it for certain things and it’s not quite capable of other things. Tomorrow it might be capable of a lot more.

Yes, priors have to be updated when the ground truth changes and the capabilities of AI change rapidly. This is how chess engines on supercomputers were competitive in the 90s then hybrid systems became the leading edge competitive and then machines took over for good and never looked back.

Eggpants 4 days ago | parent [-]

It’s not that the LLMs are better, it’s the internal tools/functions being called that do the actual work are better. They didn’t spend millions to retrain a model to statistically output the number of r’s in strawberry, but just offloaded that trivial question to a function call.

So I would say the overall service provided is better than it was, thanks to functions being built based on user queries, but not the actual LLM models themselves.

vlovich123 4 days ago | parent | next [-]

LLMs are definitely better quality today than 3 years ago at codegen quality - there’s quantitative benchmarks as well as for me my personal qualitative experience (given the gaming that companies engage in).

It is also true that the tooling and context management has gotten more sophisticated (often using models by the way). That doesn’t negate that the models themselves have gotten better at reliable tool calling so that the LLM is driving more of the show rather than purpose built coordination into the LLM and that the codegen quality is higher than it used to be.

int_19h 3 days ago | parent | prev [-]

This is a good example of making statements that are clearly not based in fact. Anyone who works with those models knows full well what a massive gap there is between e.g. GPT 3.5 and Opus 4.1 that has nothing to do with the ability to use tools.