Remix.run Logo
vonneumannstan 5 days ago

>I think the "just tweak the prompts bro" people are missing out on learning.

Alternatively they're just learning/building intuition for something else. The level of abstraction is moving upwards. I don't know why people don't seem to grok that the level of the current models is the floor, not the ceiling. Despite the naysayers like Gary Marcus, there is in fact no sign of scaling or progress slowing down at all on AI capabilities. So it might be that if there is any value in human labor left in the future it will be in being able to get AI models to do what you want correctly.

Brian_K_White 5 days ago | parent | next [-]

Wishful, self-serving, and beside the point. The primary argument here is not about the capability of the ai.

I think the same effect has been around forever in the form of every boss/manager/ceo/rando-divorcee-or-child-with-money using employees to do their thinking as a current information-handling worker or student using an ai to do their thinking.

vonneumannstan 5 days ago | parent [-]

>Wishful, self-serving, and beside the point. The primary argument here is not about the capability of the ai.

"Alternatively they're just learning/building intuition for something else."

Reading comprehension is hard.

benterix 5 days ago | parent | prev | next [-]

That would be true if several conditions were fulfilled, starting with LLMs being actually to do their tasks properly, which they still very much struggle with which basically defeats the premise of moving up an abstraction layer if you have to constantly check and correct the lower layer.

vonneumannstan 5 days ago | parent [-]

[flagged]

lazide 5 days ago | parent | prev | next [-]

I remember this exact discussion (and exact situation) with WYSIWYG UI design tools.

They were still useful, and did solve a significant portion of user problems.

They also created even more problems, and no one really went out of work long term because of them.

asveikau 5 days ago | parent | prev | next [-]

This reads to me as extremely defensive.

vonneumannstan 5 days ago | parent [-]

It's not but ok. Just responding to another version of "This generation is screwed" that has been happening literally since Socrates.

jplusequalt 5 days ago | parent | next [-]

There has been a growing amount of evidence for years that modern technology is not without it's side effects (mental health issues due to social media use, destruction of attention spans among the youth due to cell phone use, erosion of societal discourse and straight up political manipulation, and now we're seeing impacts to cognitive ability from LLMs).

thegrim33 5 days ago | parent | prev [-]

So .. people in the past were supposedly wrong about the next generation being "screwed", and therefore we all know that makes it completely impossible for any new generation at any point in history to ever be in any way worse or more screwed than previous generations. Because some people in the past were supposedly incorrect with similar assertions.

jimkri 5 days ago | parent | prev | next [-]

I don't think Gary Marcus is necessarily a naysayer; I take it that he is trying to get people to be mindful of the current AI tooling and its capabilities, and that there is more to do before we say it is what it is being marketed as. Like, GPT5 seems to be an additional feature layer of game theory examples. Check LinkedIn for how people think it behaves, and you can see patterns. But they market it as much more.

vonneumannstan 5 days ago | parent [-]

>I don't think Gary Marcus is necessarily a naysayer

Oh come on. He is by far the most well known AI poo-poo'er and it's not even close. He built his entire brand on it once he realized his own research was totally irrelevant.

KoolKat23 5 days ago | parent | prev | next [-]

Agree with this.

I mean the guy assembling a thingymajig in the factory, after a few years, can put it together with his hands 10x faster than the actual thingymajig designer. He'll tell you apply some more glue here and less glue there (it's probably slightly better, but immaterial really). However, he probably couldn't tell you what the fault tolerance of the item is, the designer can do that. We still outsource manufacturing to the guy in the factory regardless.

We just have to get better at identifying risks with using the LLMs doing the grunt work and get better in mitigating them. As you say, abstracted.

codyb 5 days ago | parent | prev | next [-]

Really? No signs of slowing down?

A year or two ago when LLMs popped on the scene my coworkers would say "Look at how great this is, I can generate test cases".

Now my coworkers are saying "I can still generate test cases! And if I'm _really pacificcccc_, I can get it to generate small functions too!".

It seems to have slowed down considerably, but maybe that's just me.

lazide 5 days ago | parent | next [-]

At the beginning, it’s easy to extrapolate ‘magic’ to ‘can do everything’.

Eventually, it stops being magic and the thinking changes - and we start to see the pros and cons, and see the gaps.

A lot of people are still in the ‘magic’ phase.

vonneumannstan 5 days ago | parent | prev | next [-]

Yeah NGL if you can't get a model that is top 1% in Competitive Coding and Gold level medal IMO tier to do anything useful thats just an indictment on your skill level with them.

tuesdaynight 5 days ago | parent | prev [-]

Sorry for the bluntness, but you sound like you have a lot of opinions about LLM performance for someone who says that doesn't use them. It's okay if you are against them, but if have used them 3 years ago, you have no idea if there were improvements or not.

Jensson 5 days ago | parent | next [-]

You can see what people built with LLM 3 years ago and what they build with LLM today and compare the two.

That is a very natural and efficient way to do it, and also more reliable than using your own experience since you are just a single data point with feelings.

You don't have to drive a car to see where cars were 20 years ago, see where cars are today, and say: "it doesn't look like cars will start flying anytime soon".

tuesdaynight 4 days ago | parent [-]

Fair, but what about saying that cars didn't improve in 20 years (the last time you drove one) because they are still not flying?

Peritract 5 days ago | parent | prev | next [-]

> you sound like you have a lot of opinions about LLM performance for someone who says that doesn't use them

It's not reasonable to treat only opinions that you agree with as valid.

Some people don't use LLMs because they are familiar with them.

tuesdaynight 4 days ago | parent [-]

My point is that this person IS NOT familiar with them, while feeling confident enough to say that these tools didn't improve with time. I'm not saying that their opinions are invalid, just highlighting the lacking of experience with the current state of these AI coding agents.

vonneumannstan 5 days ago | parent | prev [-]

"It can't do 9.9-9.11 or count the number of r's in strawberry!"

lol

Nevermark 5 days ago | parent [-]

Since models are given tokens, not letters, to process, the famous issues with counting letters is not indicative of incompetence. They are just sub-sensory for the model.

None of us can reliably count the e’s as someone talks to us, either.

hatefulmoron 5 days ago | parent [-]

It does say something that the models simultaneously:

a) "know" that they're not able to do it for the reason you've outlined (as in, you can ask about the limitations of LLMs for counting letters in words)

b) still blindly engage with the query and get the wrong answer, with no disclaimer or commentary.

If you asked me how many atoms there are in a chair, I wouldn't just give you a large natural number with no commentary.

Nevermark 4 days ago | parent [-]

That is interesting.

A factor might be that they are trained to behave like people who can see letters.

During training they have no ability to not comply, and during inference they have no ability to choose to operate differently than during training.

A pre-prompt or co-prompt that requested they only answer questions about sub-token information if they believed they actually had reason to know the answer, would be a better test.

hatefulmoron 3 days ago | parent [-]

Your prompting suggestion would certainly make them much better at this task, I would think.

I think it just points to the fact that LLMs have no "sense of self". They have no real knowledge or understanding of what they know or what they don't know. LLMs will not even reliably play the character of a machine assistant: run them long enough and they will play the character of a human being with a physical body[0]. All this points to the fact that "Claude the LLM" is just the mask that it will produce tokens using at first.

The "count the number of 'r's in strawberry" test seems to just be the easiest/fastest way to watch the mask slip. Just like that, they're mindlessly acting like a human.

[0]: https://www.anthropic.com/research/project-vend-1

fatata123 5 days ago | parent | prev | next [-]

LLMs are plateauing, and you’re in denial.

vonneumannstan 5 days ago | parent [-]

Show me one metric they are plateauing on.

pigpag 5 days ago | parent | prev [-]

[dead]