| Maybe. The reality of software engineering is that there's a lot of mediocre developers on the market and a lot of mediocre code being written; that's part of the industry, and the jobs of engineers working with other engineers and/or LLMs is that of quality control, through e.g. static analysis, code reviews, teaching, studying, etc. |
| |
| ▲ | input_sh 21 hours ago | parent [-] | | And those mediocre engineers put their work online, as do top-tier developers. In fact, I would say that the scale is likely tilted towards mediocre engineers putting more stuff online than really good ones. So statistically speaking, when the "AI" consumes all of that as its training data and returns the most likely answer when prompted, what percentage of developers will it be better than? | | |
| ▲ | simonw 11 hours ago | parent | next [-] | | That's not how modern LLMs are built. The days of dumping everything on the internet into the training data and crossing your fingers are long past. Anthropic and OpenAI spent most of 2025 focusing almost expensively on improving the coding abilities of their models, through reinforcement learning combined with additional expert curation of training data. | | |
| ▲ | input_sh 9 hours ago | parent [-] | | Silly old me, how could've I forgotten about such drastic improvements between say Sonnet 3.7 and Sonnet 4.6. It's 500x better now! Thank you for teaching me, AI understander. You're definitely not detached from reality one bit. It's me, obviously. | | |
| ▲ | simonw 8 hours ago | parent [-] | | Have you seen how many people are talking about the November 2025 inflection point, where the models ticked over from being good at running coding agents to being really good at it? |
|
| |
| ▲ | wartywhoa23 21 hours ago | parent | prev | next [-] | | These people also prefer plastic averaged-out images of AI girls to real ones. The Average is their top-tier. | |
| ▲ | jasomill 20 hours ago | parent | prev [-] | | In other words, there's probably a market for a model trained on a curated collection of high-quality code. | | |
| ▲ | simonw 11 hours ago | parent | next [-] | | That is what we have today - it's why Opus 4.5+ and GPT-5.2+ are so much better at driving coding agents than previous models were. | |
| ▲ | kelipso 18 hours ago | parent | prev [-] | | Doubt it”s sustainable. These big models keep improving at a fast pace and any progress like this made in a niche would likely get caught up to very quickly. |
|
|
|
| |
| ▲ | input_sh 21 hours ago | parent [-] | | Even speaking from a pure statistical perspective, it is quite literally impossible for "AI" that outputs world's-most-average-answer to be better than "most engineers". In fact, it's pretty easy to conclude what percentage of engineers it's better than: all it does is it consumes as much data as possible and returns the statistically most probable answer, therefore it's gonna be better than roughly 50% of engineers. Maybe you can claim that it's better than 60% of engineers because bottom-of-the-barrel engineers tend to not publish their works online for it to be used as training data, but for every one of those you have a bunch of non-engineers that don't do this for a living putting their shitty attempts at getting stuff done using code online, so I'm actually gonna correct myself immediately and say that it's about 40%. The same goes for every other output: it's gonna make the world's most average article, the most average song in a genre and so on. You can nudge it to be slightly better than the average with great effort, but no, you absolutely cannot make it better than most. | | |
| ▲ | bitexploder 14 hours ago | parent | next [-] | | Which indicates something unknown. Code quality evaluations in training. Do you know if there is any sort of code quality evaluation for the training data? I think the argument is a little reductive without knowing the actual details of the model training input pipeline and the stages of generating the output on that same dimension, but I don't really have any concrete knowledge here either, so your baseline assumption could be right. | |
| ▲ | theshrike79 21 hours ago | parent | prev | next [-] | | The thing that separates AI Agents from normal programmers is that agents don't get bored or tired. For most engineers the ability might be there, but the motivation or willingness to write, for example, 20 different test cases checking the 3 line bug you just fixed is fixed FOR SURE usually isn't there. You add maybe 1-2 tests because they're annoying boilerplate crap to write and create the PR. CI passes, you added new tests, someone will approve it. (Yes, your specific company is of course better than this and requires rigorous testing, but the vast majority isn't. Most don't even add the two tests as long as the issue is fixed.) An AI Agent will happily and without complaining use Red/Green TDD on the issue, create the 20 tests first, make sure they fail (as they should), fix the issue and then again check that all tests pass. And it'll do it in 30 minutes while you do something else. | |
| ▲ | rel_ic 21 hours ago | parent | prev | next [-] | | This is kind of like saying a kid can never become a better programmer than the average of his teachers. IMHO, the reasons not to use AI are social, not logical. | | |
| ▲ | input_sh 21 hours ago | parent | next [-] | | The kid can learn and become better over time, while "AI" can only be retrained using better training data. I'm not against using AI by any means, but I know what to use it for: for stuff where I can only do a worse than half the population because I can't be bothered to learn it properly. I don't want to toot my own horn, but I'd say I'm definitely better at my niche than 50% of the people. There are plenty of other niches where I'm not. | | |
| ▲ | arcanemachiner 14 hours ago | parent [-] | | Yeah, but it's been trained on the boring, repetitive stuff, and A LOT of code that needs to be written is just boring, repetitive stuff. By leaving the busywork for the drones, this frees up time for the mind to solve the interesting and unsolved problems. |
| |
| ▲ | nitwit005 12 hours ago | parent | prev [-] | | The AI doesn't know what good or bad code is. It doesn't know what surpassing someone means. It's been trained to generate text similar to its training data, and that's what it does. If you feed it only good code, we'd expect a better result, but currently we're feeding it average code. The cost to evaluate code quality for the huge data set is too high. | | |
| ▲ | recursive 10 hours ago | parent [-] | | The training data includes plenty of examples of labelled good and bad code. And comparisons between two implementations plus trade-offs and costs and benefits. I think it absolutely does "know" good code, in the sense that it can know anything at all. | | |
| ▲ | nitwit005 9 hours ago | parent [-] | | There does exist some text making comparisons like that, but compared to the raw quantity of totally unlabeled code out there, it's tiny. You can do some basic checks like "does it actually compile", but for the most part you'd really need to go out and do manual categorization, which would be brutally expensive. |
|
|
| |
| ▲ | ValentineC 10 hours ago | parent | prev | next [-] | | > Maybe you can claim that it's better than 60% of engineers because bottom-of-the-barrel engineers tend to not publish their works online for it to be used as training data, but for every one of those you have a bunch of non-engineers that don't do this for a living putting their shitty attempts at getting stuff done using code online, so I'm actually gonna correct myself immediately and say that it's about 40%. And there are a bunch of engineers from certain cultures who don't know what they don't know, but believe that a massive portfolio of slop is better than one or two well-developed projects. I can only hope that the people training the good coding models know to tell AI that these are antipatterns, not patterns. | |
| ▲ | enraged_camel 13 hours ago | parent | prev [-] | | >> Even speaking from a pure statistical perspective, it is quite literally impossible for "AI" that outputs world's-most-average-answer to be better than "most engineers". In fact, it's pretty easy to conclude what percentage of engineers it's better than: all it does is it consumes as much data as possible and returns the statistically most probable answer Yeah, you come across as someone who thinks that the AI simply spits out the average of the code in its training data. I don't think that understanding is accurate, to say the least. |
|
|