Remix.run Logo
codyb 5 days ago

Really? No signs of slowing down?

A year or two ago when LLMs popped on the scene my coworkers would say "Look at how great this is, I can generate test cases".

Now my coworkers are saying "I can still generate test cases! And if I'm _really pacificcccc_, I can get it to generate small functions too!".

It seems to have slowed down considerably, but maybe that's just me.

lazide 5 days ago | parent | next [-]

At the beginning, it’s easy to extrapolate ‘magic’ to ‘can do everything’.

Eventually, it stops being magic and the thinking changes - and we start to see the pros and cons, and see the gaps.

A lot of people are still in the ‘magic’ phase.

vonneumannstan 5 days ago | parent | prev | next [-]

Yeah NGL if you can't get a model that is top 1% in Competitive Coding and Gold level medal IMO tier to do anything useful thats just an indictment on your skill level with them.

tuesdaynight 5 days ago | parent | prev [-]

Sorry for the bluntness, but you sound like you have a lot of opinions about LLM performance for someone who says that doesn't use them. It's okay if you are against them, but if have used them 3 years ago, you have no idea if there were improvements or not.

Jensson 5 days ago | parent | next [-]

You can see what people built with LLM 3 years ago and what they build with LLM today and compare the two.

That is a very natural and efficient way to do it, and also more reliable than using your own experience since you are just a single data point with feelings.

You don't have to drive a car to see where cars were 20 years ago, see where cars are today, and say: "it doesn't look like cars will start flying anytime soon".

tuesdaynight 4 days ago | parent [-]

Fair, but what about saying that cars didn't improve in 20 years (the last time you drove one) because they are still not flying?

Peritract 5 days ago | parent | prev | next [-]

> you sound like you have a lot of opinions about LLM performance for someone who says that doesn't use them

It's not reasonable to treat only opinions that you agree with as valid.

Some people don't use LLMs because they are familiar with them.

tuesdaynight 4 days ago | parent [-]

My point is that this person IS NOT familiar with them, while feeling confident enough to say that these tools didn't improve with time. I'm not saying that their opinions are invalid, just highlighting the lacking of experience with the current state of these AI coding agents.

vonneumannstan 5 days ago | parent | prev [-]

"It can't do 9.9-9.11 or count the number of r's in strawberry!"

lol

Nevermark 5 days ago | parent [-]

Since models are given tokens, not letters, to process, the famous issues with counting letters is not indicative of incompetence. They are just sub-sensory for the model.

None of us can reliably count the e’s as someone talks to us, either.

hatefulmoron 5 days ago | parent [-]

It does say something that the models simultaneously:

a) "know" that they're not able to do it for the reason you've outlined (as in, you can ask about the limitations of LLMs for counting letters in words)

b) still blindly engage with the query and get the wrong answer, with no disclaimer or commentary.

If you asked me how many atoms there are in a chair, I wouldn't just give you a large natural number with no commentary.

Nevermark 4 days ago | parent [-]

That is interesting.

A factor might be that they are trained to behave like people who can see letters.

During training they have no ability to not comply, and during inference they have no ability to choose to operate differently than during training.

A pre-prompt or co-prompt that requested they only answer questions about sub-token information if they believed they actually had reason to know the answer, would be a better test.

hatefulmoron 3 days ago | parent [-]

Your prompting suggestion would certainly make them much better at this task, I would think.

I think it just points to the fact that LLMs have no "sense of self". They have no real knowledge or understanding of what they know or what they don't know. LLMs will not even reliably play the character of a machine assistant: run them long enough and they will play the character of a human being with a physical body[0]. All this points to the fact that "Claude the LLM" is just the mask that it will produce tokens using at first.

The "count the number of 'r's in strawberry" test seems to just be the easiest/fastest way to watch the mask slip. Just like that, they're mindlessly acting like a human.

[0]: https://www.anthropic.com/research/project-vend-1