▲ | ben_w 3 days ago | |||||||
I suspect we're not in strong disagreement here, because you recognise that not all humans are equal, and that some are indeed worse than LLMs. But: > because by definition LLMs tend towards the mean This part is false: the mean human can't write code at all. Also, as per your own point: > They may excel at solving very narrow problems with decent results, like in that programming competition recently. LLMs are often in the top decile of coding challenges, which are already limited to better-than-average developers. Now, these same models that get top decile scores in challenges are still not in the top decile overall because the role of software developer is much broader than just leetcode, but this still demonstrates the point: LLMs do not tend towards the mean. > But those are indeed very narrowly defined problems, and while they may solve it decently in limited time, that is roughly their overall limit, while a human, given more time, can excel to a much higher level. Except "code" is itself not narrowly-defined even despite what I just said. Even within one programming language, comprehension of the natural language task description is itself much harder and more general than any programming language, and both the programming language and all the libraries are described in a mixture of natural and formal language. Even just the ability to recognise if it's looking at examples of C or JavaScript is something it had to learn rather than being explicitly programmed with knowledge of. Now sure, I will absolutely say that if the working definition of "intelligence" is about how few examples are needed to learn a new thing, then transformer models are "stupid". But, to a certain degree, they're able to making up for being very very stupid by being very very stupid very very quickly and very very cheaply — cheap enough and fast enough that when you do hit their skill limits, there's many cases where one can afford to boost them a noticeable degree, and it's affordable even though every m-times-n-quality-points you need to boost them by comes with 2^n increase in their cost in both time and money. Not always, and it's an exponential cost per linear improvement, but often. | ||||||||
▲ | zelphirkalt 3 days ago | parent [-] | |||||||
> LLMs are often in the top decile of coding challenges, which are already limited to better-than-average developers. Now, these same models that get top decile scores in challenges are still not in the top decile overall because the role of software developer is much broader than just leetcode, but this still demonstrates the point: LLMs do not tend towards the mean. Like I said: Very narrowly defined problems, yes they can excel at it. But sometimes they don't even excel at that. Every couple of months I try to make LLMs write a specific function, but neither did they succeed in January, nor did they succeed a few weeks ago. Basically, zero progress in their capability of following instructions regarding the design of the function. They cannot think and as soon as something is rare in their training data, or even non-existent, they fail utterly. Even direct instructions like "do not make use of the following functions ..." they disregard, because they cannot help themselves with the data they were trained on. And before you ask: I tried this on recent Qwen Coder, Mistral 3.1, ChatGPT, and someone else tried it for me on Claude-something. None of them did any better. All incapable of doing it. If the solution is in their training data, its signal is so weak, that they never consider it. This leads me to question, how much shit code they introduce to solve a narrowly defined problem like in coding competitions. | ||||||||
|