▲ | zelphirkalt 3 days ago | |
> LLMs are often in the top decile of coding challenges, which are already limited to better-than-average developers. Now, these same models that get top decile scores in challenges are still not in the top decile overall because the role of software developer is much broader than just leetcode, but this still demonstrates the point: LLMs do not tend towards the mean. Like I said: Very narrowly defined problems, yes they can excel at it. But sometimes they don't even excel at that. Every couple of months I try to make LLMs write a specific function, but neither did they succeed in January, nor did they succeed a few weeks ago. Basically, zero progress in their capability of following instructions regarding the design of the function. They cannot think and as soon as something is rare in their training data, or even non-existent, they fail utterly. Even direct instructions like "do not make use of the following functions ..." they disregard, because they cannot help themselves with the data they were trained on. And before you ask: I tried this on recent Qwen Coder, Mistral 3.1, ChatGPT, and someone else tried it for me on Claude-something. None of them did any better. All incapable of doing it. If the solution is in their training data, its signal is so weak, that they never consider it. This leads me to question, how much shit code they introduce to solve a narrowly defined problem like in coding competitions. | ||
▲ | ben_w 2 days ago | parent [-] | |
> Like I said: Very narrowly defined problems, yes they can excel at it. See next paragraph. |