| ▲ | dfxm12 4 days ago |
| there’s one depressing anecdote that I keep on seeing: the junior engineer, empowered by some class of LLM tool, who deposits giant, untested PRs on their coworkers—or open source maintainers—and expects the “code review” process to handle the rest. Is anyone else seeing this in their orgs? I'm not... |
|
| ▲ | 0x500x79 4 days ago | parent | next [-] |
| I am currently going through this with someone in our organization. Unfortunately, this person is vibe coding completely, and even the PR process is painful:
* The coding agent reverts previously applied feedback
* Coding agent not following standards throughout the code base
* Coding agent re-inventing solutions that already exist
* PR feedback is being responded to with agent output
* 50k line PRs that required a 10-20 line change
* Lack of testing (though there are some automated tests, but their validations are slim/lacking)
* Bad error handling/flow handling |
| |
| ▲ | nunez 4 days ago | parent | next [-] | | > 50k line PRs that required a 10-20 line change This is hilarious. Not when you're the reviewer, of course, but as a bystander, this is expert-level enterprise-grade trolling. | |
| ▲ | LandR 4 days ago | parent | prev | next [-] | | Fire them? | | |
| ▲ | 0x500x79 4 days ago | parent | next [-] | | I believe it is getting close to this. Things like this just take time though, and when this person talks to management/leadership they talk about how much they are producing and how everyone is blocking their work. So it becomes a challenging political maneuvering depending on the ability of certain leadership to see through the BS. (By my organization, I meant my company - this person doesn't report to me or in my tree). | |
| ▲ | JambalayaJimbo 4 days ago | parent | prev [-] | | This is not really an option for your standard IC. |
| |
| ▲ | gardenhedge 4 days ago | parent | prev [-] | | Just reject the PR? |
|
|
| ▲ | briliantbrandon 4 days ago | parent | prev | next [-] |
| I'm seeing a little bit of this. However, I will add that the primary culprits are engineers that were submitting low quality PRs before they had access to LLMs, they can just submit them faster now. |
| |
| ▲ | lm28469 4 days ago | parent | next [-] | | LLMs are tools that make mediocre devs 100x more "productive" and good devs 2x more productive | | |
| ▲ | jennyholzer2 4 days ago | parent | next [-] | | From my vantage I would argue LLMs make good devs around 0.65x more productive | | |
| ▲ | roblh 4 days ago | parent | next [-] | | I think they make good devs 2x more productive for the first month, which then slowly declines as that good dev spends less time actually writing and understanding and debugging code until it falls well below the 1x mark. It’s basically a high interest loan people take against their own skills. For some people that loan might be worth it. Maybe they’re trying to change their role in an organization and need the boost to start taking up new responsibilities they want to own. I think it’s temporary though. The slow shift into “skim mode”, where the authors just don’t quite put that same amount of effort into understanding what’s being churned out. I dunno, that’s just what I’ve seen. | | |
| ▲ | candiddevmike 4 days ago | parent | next [-] | | Because there's a mental overhead when you're not writing the code that is arguably worse than when you are writing the code. No one is talking about this enough IMO but that's why everyone is so exhausted when using LLMs and end up just pulling the slot machine until it works without actually reading it. Reading code sucks, it always has. The flow state we all crave is when the code is in our working memory as an understood construct and we're just translating our mental model to a programming language. You don't get that with LLMs. It devolves into prorgamming minutae equivalent to "a little to the left" but with the added complexity that "left" is hundreds of lines of code. | |
| ▲ | AstroBen 4 days ago | parent | prev [-] | | I really feel this myself. If I write home-grown organic code then I have no choice but to fully understand the problem. Using an LLM it's very easy to be lazy, at least in the short term Where does that get me after 3 months? I end up working on a codebase I barely understand. My own skills have degraded. It just gets worse the longer you go This is also coming from my experience in the best case scenario: I enjoy coding and am working on something I care about the quality of. Lots of people don't have even that |
| |
| ▲ | dsego 4 days ago | parent | prev | next [-] | | I think on average a dev can be x percent more productive, but there is a best case and worst case scenario. Sometimes it's a shortcut to crank out a solution quickly, other times the LLM can spin you in circles and you lose the whole day in a loop where the LLM is fixing its own mistakes, and it would've been easier to just spend some time working it out yourself. | |
| ▲ | bluGill 4 days ago | parent | prev | next [-] | | Good devs are still learning how to use LLMs, and so are willing to accept the 0.65x once in a while. Any complex tool will have a learning curve. Most tools improve over time. As such good devs either have found how to use LLMs to make them more productive (probably not 10x, but even 1.1x is something), or they try them again every few months to see if things are better. | | |
| ▲ | jennyholzer2 4 days ago | parent [-] | | [flagged] | | |
| ▲ | simonw 4 days ago | parent [-] | | Hi, delusional developer reporting for duty here. | | |
| ▲ | Avicebron 4 days ago | parent [-] | | How are you measuring productivity these days Simon? Do you have a boss that has certain expectations? If you don't hit those are you going to lose your house? | | |
| ▲ | simonw 4 days ago | parent [-] | | I work for myself, so mainly through guilt and self-doubt. | | |
| ▲ | wiml 4 days ago | parent [-] | | One of the things LLMs are demonstrably good at is eliminating self-doubt. That's why they're so disastrous. |
|
|
|
|
| |
| ▲ | 4 days ago | parent | prev | next [-] | | [deleted] | | | |
| ▲ | coffeebeqn 4 days ago | parent | prev | next [-] | | I just spent a day trying to get Claude to write reasonable unit tests and then after sleeping on it, reverted everything and did it myself. I’m not gonna be using it for a while because it 0.5x’d me once again | |
| ▲ | square_usual 4 days ago | parent | prev [-] | | Yep, that's why very accomplished, widely regarded developers like Mitchell Hashimoto and Antirez use them. They need to make programming more challenging to keep it fun. | | |
| |
| ▲ | chasd00 4 days ago | parent | prev | next [-] | | LLMs are great at spewing content and code is a form of "content". I think what we're seeing is software development turning into youtube. Content creators cranking out content, some is great, most is meh, a lot is really bad. I do find it all a bit funny and ironic. My wife was a journalist and bemoaned news blogs and social media for terrible terrible writing claiming it was journalism. She would tell me about how much work quality journalism is and all the mistakes these bloggers and social media make and how detrimental it was to society at large blah blah blah Now the power to create tons and tons of code (ie content) is in the hands of everyone and here we are complaining about it just like my wife use to complain about journalism. I think the myth of the highly regarded Software Developer perched in front of the warming glow of a screen solving and automating critical problems is coming to an end. Deservedly really, there's nothing more special about typing words into an editor than, say, framing a house. The novelty is over. | |
| ▲ | lunar_mycroft 4 days ago | parent | prev [-] | | [citation needed]. No study I've seen shows an even 50% productivity improvement for programming, let alone a 100% or 9900% improvement. |
| |
| ▲ | dfxm12 4 days ago | parent | prev [-] | | What's the ratio of people who things the right way vs not? I mean, is it a matter of giving them feedback to remind them what a "quality PR" is? Does that help? | | |
| ▲ | briliantbrandon 4 days ago | parent | next [-] | | It's roughly 1/10 that are causing issues. Not a huge deal but dealing with them inevitably takes up a couple hours a week. We also have a codebase that is shared with some other teams and our primary offenders are on one of those separate teams. I think this is largely an issue that can be solved culturally within a team, we just unfortunately only have so much input on how other teams work. It doesn't help either when their manager doesn't seem to care about the feedback... Corporate politics are fun. | | |
| ▲ | dfxm12 4 days ago | parent [-] | | Yeah, I mean to get back to the original statement in the blog, this seems like less of a tech issue and more of a culture issue. The LLM enables the junior to do this once. It's the team culture that allows them to continue doing it. |
| |
| ▲ | jennyholzer2 4 days ago | parent | prev [-] | | LLMs have dramatically empowered sociopath software developers. If you are sufficiently motivated to appear more "productive" than your coworkers, you can force them to review thousands of lines of incorrect AI slop code while you sit back and mess around with your chatbots. Your coworkers no longer have enough time to work on their in-progress PRs, so you can dominate the development team in terms of LOC shipped. Understand that sociopaths are skilled at navigating social and bureaucratic environments. A sociopath who ships the most LOC will get the promotion every single time. | | |
| ▲ | andy99 4 days ago | parent [-] | | Only if leadership lets them. Right now (anecdotally) a lot of “leaders” don’t understand the difference between AI generated and human generated work, and just look at loc as productivity so all incentives are on AI coding, but that will change. | | |
| ▲ | heliumtera 4 days ago | parent [-] | | It will never change.
Managers will consider every stupid metric players push to sell their solutions.
Be it code coverage, extensive CI/CD pipelines with useless steps, "productivity gains" with gen tools.
The gen tools euphoria is stupid and will cease to exist, but before this was bdd,tdd,DDD, test before, test after, test your mocks, transpile to a different language and then ignore the output, code maturity, best practices, oop, pants in head oriented programming...
There is always something stupid on the horizon this is certainly not the last stupid craze |
|
|
|
|
|
| ▲ | zx2c4 4 days ago | parent | prev | next [-] |
| Voila: https://github.com/WireGuard/wireguard-android/pull/82
https://github.com/WireGuard/wireguard-android/pull/80 In that first one, the double pasted AI retort in the last comment is pretty wild. In both of these, look at the actual "files changed" tab for the wtf. |
| |
| ▲ | newsoftheday 4 days ago | parent | next [-] | | That's a good example of what we're seeing as leads, thanks. | |
| ▲ | 4 days ago | parent | prev | next [-] | | [deleted] | |
| ▲ | drio 3 days ago | parent | prev | next [-] | | Scary stuff. I’d love to hear your thoughts on LLMs, Jason. How do you use them in your projects? Do they play a role in your workflow at all? | |
| ▲ | IshKebab 4 days ago | parent | prev [-] | | Yeah this guys comment here is spot on: https://github.com/WireGuard/wireguard-android/pull/80#issue... I recently reviewed a PR that I suspect is AI generated. It added a function that doesn't appear to be called from anywhere. It's shit because AI is absolutely not on the level of a good developer yet. So it changes the expectation. If a PR is not AI generated then there is a reasonable expectation that a vaguely competent human has actually thought about it. If it's AI generated then the expectation is that they didn't really think about it at all and are just hoping the AI got it right (which it very often doesn't). It's rude because you're essentially pawning off work that the author should have done to the reviewer. Obviously not everyone dumps raw AI generated code straight into a PR, so I don't have any problem with using AI in general. But if I can tell that your code is AI generated (as you easily can in the cases you linked), then you've definitely done it wrong. |
|
|
| ▲ | fnands 4 days ago | parent | prev | next [-] |
| A friend of mine is working for a small-ish startup (11 people) and he gets to work and sees the CTO push 10k loc changes straight to main at 3 am. Probs fine when you are still in the exploration phase of a startup, scary once you get to some kind of stability |
| |
| ▲ | ryandrake 4 days ago | parent | next [-] | | I feel like this becomes kind of unacceptable as soon as you take on your first developer employee. 10K LOC changes from the CTO is fine when it's only the CTO working on the project. Hell, for my hobby projects, I try to keep individual commits under 50-100 lines of code. | | |
| ▲ | bonesss 4 days ago | parent [-] | | Templates and templating languages are still a thing. Source generators are a thing. Languages that support macros exist. Metaprogramming is always an option. Systems that write systems… If these AIs are so smart, why the giant LOCs? Sure, it’s cheaper today than yesterday to write out boilerplate, but programming is about eliminating boilerplate and using more powerful abstractions. It’s easy to save time doing lots of repetitive nonsense, stopping the nonsense should be the point. |
| |
| ▲ | peab 4 days ago | parent | prev | next [-] | | Lol I worked at a startup where the CTO did this. The problem was that it was pure spaghetti code. It was so bad it kept me up at night, thinking about how to fix things. I left within 30 days | |
| ▲ | coffeebeqn 4 days ago | parent | prev | next [-] | | I worked with a “CTO” who did that before LLMs - one of the worst jobs I have had in the last 10 years. I spent at least 50% of my time putting out fires or refactoring his garbage code | |
| ▲ | tossandthrow 4 days ago | parent | prev | next [-] | | The cto is ultimately responsible for the outcome and will be there at 4am to fix stuff. | | |
| ▲ | pjc50 4 days ago | parent [-] | | Yes .. and no. Someone who does this will definitely make the staff clean up after them. |
| |
| ▲ | jimbohn 4 days ago | parent | prev | next [-] | | I'd go mental if I was a SWE having to mop that up later | |
| ▲ | titzer 4 days ago | parent | prev [-] | | That's...idiotic. | | |
|
|
| ▲ | davey48016 4 days ago | parent | prev | next [-] |
| A friend of mine has a junior engineer who does this and then responds to questions like "Why did you do X?" with "I didn't, Claude did, I don't know why". |
| |
| ▲ | tossandthrow 4 days ago | parent | next [-] | | That would be an immidiate reason of termination in my book. | | |
| ▲ | fennecfoxy 4 days ago | parent [-] | | Yes, if they can't debug + fix the reason the production system is down or not working correctly then they're not doing their job, imo. Developers aren't hired to write code that's never run (at least in my opinion). We're also responsible for running the code/keeping it running. |
| |
| ▲ | gardenhedge 4 days ago | parent | prev | next [-] | | Some other comments suggest immediately firing.. but a junior engineer needs to be mentored. It should be explained to them clearly that they need to understand the changes they have made. They should also be pointed towards the coding standards and SDLC documentation. If they refuse to change their ways, then firing makes sense. | |
| ▲ | Ekaros 4 days ago | parent | prev | next [-] | | I think words that would follow from me would get me send to HR... And if it was repeated... Well I would probably get fired... | |
| ▲ | jennyholzer2 4 days ago | parent | prev | next [-] | | no hate but i would try to fire someone for saying that | | | |
| ▲ | insin 4 days ago | parent | prev [-] | | See also "Why did you do X?" → Flurry of new commits → Conversation marked as resolved And not just from juniors |
|
|
| ▲ | stackskipton 4 days ago | parent | prev | next [-] |
| Yep. Remember, people not posting on this website are just grinding away at jobs where their individual output does not matter, and entire motivation is work JUST hard enough not to get fired. They don't get stock grants, extremely favorable stock options or anything else, they get salary and MAYBE a small bonus based off business factors they have little control over. My eyes were wide open when 2 jobs ago, they said they would be blocking all personal web browsing from work computers. Multiple Software Devs were unhappy because they were using their work laptop for booking flights, dealing with their kids schools stuff and other personal things. They did not have personal computer at all. |
| |
| ▲ | throw1235435 3 days ago | parent | next [-] | | There are people posting on this website that are in that category; or in those companies. For example most people working outside America as a SWE who like the profession. The options to work for a place that gives stock options, and equity in general is small -> and generally in many countries is heavily penalised tax wise. | |
| ▲ | nutjob2 4 days ago | parent | prev [-] | | They don't have phones? | | |
| ▲ | stackskipton 4 days ago | parent [-] | | They do but obviously laptop is easier than doing it on their phone. That’s what most of them ended up doing. |
|
|
|
| ▲ | mrkeen 4 days ago | parent | prev | next [-] |
| I don't see most PRs because they happen in other teams, but I am part of Slack channel where there are too many "oops" messages for my liking. I.e. 1-2 times a month, there's an SQL script posted that will be run against prod to "hopefully fix data for all customers who were put into a bad state from a previous code release". The person who posts this type of message most often is also the one running internal demos of the latest AI flows and trying to get everyone else onboard. |
|
| ▲ | hexbin010 4 days ago | parent | prev | next [-] |
| Similar, at my last job. And the pushback was greater because super duper clever AI helped write it, who obviously knows more than any other senior engineer could know, so they were expecting an immediate PR approval and got all uppity when you tried to suggest changes. |
| |
| ▲ | endemic 4 days ago | parent [-] | | Hah! I've been trying to push back on this sort of thought. The bot writes code for you, not you for the bot. |
|
|
| ▲ | kaffekaka 4 days ago | parent | prev | next [-] |
| I thought we were not, but we had just been lucky. A sequence of events lately have shown that the struggle is real. This was not a junior developer though, but an experienced one. Experience does not equal skill, evidently. |
|
| ▲ | jennyholzer2 4 days ago | parent | prev | next [-] |
| i left my last job because this was endemic |
| |
|
| ▲ | peab 4 days ago | parent | prev | next [-] |
| Definitely seeing a bit of this, but it isn't constrained to junior devs. It's also pretty solvable by explaining to the person why it's not great, and just updating team norms. |
|
| ▲ | iamflimflam1 4 days ago | parent | prev | next [-] |
| I'm seeing it on some open source projects I maintain. Recently had 10 or so PRs come in. All very valid features - but from looking at them, not actually tested. |
|
| ▲ | zahlman 4 days ago | parent | prev | next [-] |
| Quite a few FOSS maintainers have been speaking up about it. |
|
| ▲ | nbaugh1 4 days ago | parent | prev | next [-] |
| Not at all. Submitting untested PRs is a wildly outside of my experience. Having tests written to cover your code is a pre-requisite for having your PR reviewed on our team. "Does it work" aka passing manual testing, is literally the bare minimum before submitting a PR |
| |
| ▲ | ncruces 4 days ago | parent [-] | | If it's all vibe coded, how do you know — without review — that the new tests, for a new feature, test anything useful at all? | | |
| ▲ | AnimalMuppet 3 days ago | parent [-] | | When I was in a test-driven development environment, one of our rules was that you had to see the test fail. You had to prove that it would actually test what you were trying to test. |
|
|
|
| ▲ | bluGill 4 days ago | parent | prev | next [-] |
| It isn't only junior engineers, but otherwise. It is a small number of people from all levels. People do what they think they will be rewarded for. When you think your job is to write a lot of code then LLMs are great. When you need quality code you start to ask if LLMs are better or not? |
|
| ▲ | eudamoniac 4 days ago | parent | prev | next [-] |
| I started seeing it from a particularly poor developer sometime last year. I was the only reviewer for him so I saw all of his PRs. He refused to stop despite my polite and then not so polite admonishments, and was soon fired for it. |
|
| ▲ | neutronicus 4 days ago | parent | prev | next [-] |
| I'm not either But LLMs don't really perform well enough on our codebase to allow you to generate things that even appear to work. And I'm the most junior member of my team at 37 years of age, hired in 2019. I really tried to follow the mandate from on high to use Copilot, but the Agent mode can't even write code that compiles with the tools available to it. Luckily I hooked it up to gptel so I can at least ask it quick questions about big functions I don't want to read in emacs. |
| |
| ▲ | notpachet 4 days ago | parent [-] | | > And I'm the most junior member of my team at 37 years of age This sounds fucking awesome. | | |
| ▲ | neutronicus 4 days ago | parent [-] | | Would be nice to have someone enthusiastic junior to me. Most of the team is comfortable in their wheelhouse and when new stuff comes down the pipe it's hard to get them mobilized. I had leadership on a big green-field project and felt like we could have really used a junior. |
|
|
|
| ▲ | ncruces 4 days ago | parent | prev | next [-] |
| Yes, in the only successful OSS project that I “maintain.” Fully vibe coded, which at least they admitted. And when I pointed out the thing is off by an order of magnitude, and as such doesn't implement said feature — at all — we get pressed on our AI policy, so as to not waste their time. I don't have an AI policy, like I don't have an IDE policy, but things get ridiculous fast with vibe coding. |
|
| ▲ | 4 days ago | parent | prev | next [-] |
| [deleted] |
|
| ▲ | x3n0ph3n3 4 days ago | parent | prev | next [-] |
| It's been a struggle with a few teammates that we are trying to solve through explicit policy, feedback, and ultimately management action. |
| |
| ▲ | dfxm12 4 days ago | parent [-] | | Yeah, a slice of this is technology related, but it's really a policy issue. It's probably easier to manage with a tighter team. Maybe I'm taking team size for granted. |
|
|
| ▲ | nunez 4 days ago | parent | prev | next [-] |
| I feel like a story about some open-source project getting (and rejecting) mammoth-sized PRs hits HN every week! |
|
| ▲ | Yodel0914 4 days ago | parent | prev | next [-] |
| Not so much the huge PRs, but definitely the LLM generated code that the “developer” doesn’t understand. |
|
| ▲ | wizzwizz4 4 days ago | parent | prev | next [-] |
| It's not a new phenomenon. Time was, people would copy-paste from blog posts with the same effect. |
| |
| ▲ | lm28469 4 days ago | parent | next [-] | | Always the same old tiring "this has always been possible before in some remotely similar fashion hence we should not criticise anything ever again" argument. You could intuitively think it's just a difference of degree, but it's more akin to a difference of kind. Same for a nuke vs a spear, both are weapons, no one argues they're similar enough that we can treat them the same way | | |
| ▲ | array_key_first 4 days ago | parent [-] | | Yes, I'm so over this argument. It can literally be made for anything, and it is! At the end of the day we're not performing war by poking other people with long sticks and we're not getting the word out by sending out a carrier pigeon. Methods and medium matters. |
| |
| ▲ | evilduck 4 days ago | parent | prev | next [-] | | I would bet in most organizations you can find a copy-pasted version of the top SO answer for email regex in their language of choice, and if you chase down the original commit author they couldn't explain how it works. | | |
| ▲ | 1-more 4 days ago | parent [-] | | I think it's impossible to actually write an email regex because addresses can have arbitrarily deeply nested escaping. I may have that wrong. I'd hope that regex would be .+@.+ and that's it (watch me get Cunninghammed because there is some valid address wherein those plusses should be stars). | | |
| |
| ▲ | nunez 4 days ago | parent | prev | next [-] | | Yeah, but being able to produce nuclear-sized 10k+ LOC PRs to open-source projects in minutes with relatively-zero effort definitely is. At least you had to use your brain to know which blog posts/SO answers to copypasta from. | |
| ▲ | bgwalter 4 days ago | parent | prev | next [-] | | I don't see the problem with fentanyl given that people have been using caffeine forever. | |
| ▲ | troyvit 4 days ago | parent | prev [-] | | I used to do that in simpler days. I'd add a link to where I copied it from so we could reference it if there were problems. This was for relatively small projects with just a few people. | | |
| ▲ | jennyholzer2 4 days ago | parent [-] | | > I'd add a link to where I copied it from LLMs can't do this. Your code is unambiguously better than any LLM code if you can comment a link to the stackoverflow post you copied it from. | | |
| ▲ | newsoftheday 4 days ago | parent | next [-] | | Agreed on the first part for sure since an LLM is the computer/software version of a blender. So, I'm agreed on the second part too then. | |
| ▲ | lcnPylGDnU4H9OF 4 days ago | parent | prev [-] | | > Your code is unambiguously better than any LLM code if you can comment a link to the stackoverflow post you copied it from. This is not a truism. "My" code might come from an LLM and that's fine if I can be reasonably confident it works. I might try to gain that confidence by testing the code and reading it to understand what it's doing. It is also true of blog post code, regardless of how I refer to the code; if I link to the blog post, it's because it does a better job of explaining than I ever could in code comments. Whether LLMs make one more productive is hard to measure but it seems to be missing the point to write this. The point is, including the code is a choice and one should be mindful of it, no matter the code's origin. At that point, this comes off like you just have something to prove; there doesn't seem to be a reason not to use the LLM code if you know it works and you know why it works. | | |
| ▲ | wizzwizz4 4 days ago | parent [-] | | Believing you know how it works and why it works is not the same as that actually being the case. If the code has no author (in that it's been plagiarised by a statistical process that introduces errors), there's nowhere to go if you realise "oops, I didn't understand that as well as I had thought!". | | |
| ▲ | lcnPylGDnU4H9OF 4 days ago | parent [-] | | > If the code has no author ... there's nowhere to go if you realise "oops, I didn't understand that as well as I had thought!" That's also true if I author the code myself; I can't go to anyone for help with it, so if it doesn't work then I have to figure out why. > Believing you know how it works and why it works is not the same as that actually being the case. My series of accidental successes producing working code is honestly starting to seem like real skill and experience at this point. Not sure what else you'd call it. | | |
| ▲ | wizzwizz4 3 days ago | parent [-] | | > so if it doesn't work then I have to figure out why. But it's built on top of things that are understood. If it doesn't work, then either: • You didn't understand the problem fully, so the approach you were using is wrong. • You didn't understand the language (library, etc) correctly, so the computer didn't grasp your meaning. • The code you wrote isn't the code you intended to write. This is a much more tractable situation to be in than "nobody knows what the code means, or has a mental model for how it's supposed to operate", which is the norm for a sufficiently-large LLM-produced codebase. > My series of accidental successes That somewhat misses the point. To write working code, you must have some understanding of the relationship between your intention and your output. LLMs have a poor-to-nonexistent understanding of this relationship, which they cover up with the ability to regurgitate (permutations of) a large corpus of examples – but this does not grant them the ability to operate outside the domain of those examples. LLM-generated codebases very much do not lie within that domain: they lack the clues and signs of underlying understanding that human readers and (to an extent) LLMs rely on. Worse, the LLMs do replicate those signals, but they don't encode anything coherent in the signal. Unless you are very used to critically analysing LLM output, this can be highly misleading. (It reminds me of how chess grandmasters blunder, and struggle to even remember, unreachable board positions.) Believing you know how LLM-generated code works, and why it works, is not the same as that actually being the case – in a very real sense that is different to that of code with human authors. | | |
| ▲ | lcnPylGDnU4H9OF 3 days ago | parent | next [-] | | > "nobody knows what the code means, or has a mental model for how it's supposed to operate" > Believing you know how LLM-generated code works, and why it works, is not the same as that actually being the case This is a strawman argument which I'm not really interested to engage. You can assume competence. (In a scenario where one doesn't make these mistakes, what's left in your argument? It is a sufficiently strong claim to say these cannot be avoided such that it is reasonable to dismiss the claim unless supporting evidence is provided. In other words, the solution is as simple as not making these mistakes.) As I wrote up-thread, including the code is a choice and one should be mindful of it. | | |
| ▲ | wizzwizz4 3 days ago | parent [-] | | I am assuming competence. Competent people make these mistakes. If "assume competence" means "assume that people do not make the mistakes they are observed to make", then why write tests? Wherefore bounds checking? Pilots are competent, so pre-flight checklists are a waste of time. Your doctor's competent: why seek a second opinion? Being mindful involves compensating for these things. It's possible that you're just that good – that you can implement a solution "as simple as not making these mistakes" –, in which case, I'd appreciate if you could write up your method and share it with us mere mortals. But could it also be possible that you are making these mistakes, and simply haven't noticed yet? How would you know if your understanding of the program didn't match the actual program, if you've only tested the region in which the behaviours of both coincide? | | |
| |
| ▲ | fragmede 3 days ago | parent | prev [-] | | Just like there are some easy "tells" with LLM generated English, vibecode has a certain smell to it. Parallel variables that do the same thing is probably the most common one I've seen in the hundreds of thousands of lines of vibecode I've generated and then reviewed (and fixed) by now. That's the philosophical Chinese room thought experiment though. It's a computer. Some sand that we melted into a special shape. Can it "understand"? Leave that for philosophers to decide. There's code, that was generated via LLM and not yacc, fine. Code is code though. If you sit down and read all of the code to understand what each variable, function, and class does, it doesn't matter where the code came from, that is what we call understanding what the code does. Sure, most people are too lazy to actually do that, and again, vibecode has a certain smell to it, but to claim that some because some artificial intelligence generated the code makes it incomprehensible to humans seems unsupported. It's fair to point out that there may not be humans that have bothered to, but that's a different claim. If we simplify the question, if ChatGPT generates the code to generate the Fibonacci sequence, can we, as humans, understand that code? Can we understand it if a human writes that same seven lines of code? As we scale up to more complex code though, at what point does it become incomprehensible to human grade intelligence? If it's all vibecode that isn't being reviewed and is just being thrown into a repo, then sure, no human does understand it. But it's just code. With enough bashing your head against it, even if there are three singleton factory classes doing almost the exact same thing in parallel and they only share state on Wednesdays over an RPC mechanism that shouldn't even work in the first place, but somehow it does, code is still code. There's not arcane hidden whitespace that whispers to the compiler to behave differently because AI generated it. It may be weird and different, but have you tried Erlang? You huff enough of the right kind of glue and you can get anything to make sense. If we go back to the Chinese room thought experiment though. If I, as a human, am able to work on tickets to cause intentional changes to the behavior of the vibecoded program/system that results in desired behavior/changes, at what point does it become actual understand vs merely thinking I understand the code. Say you start at BigCo and are given access to their million line repo(s) with no docs and are given a ticket to work on. Ugh. You just barely started. But after you've been there for five years, it's obvious to you what the Pequad service does, and you might even know who gave it that name. If the claim is LLMs generate code that's simply incomprehensible by humans, the two counterexamples I have for you are TheDailyWtf.com, and Haskell. | | |
| ▲ | wizzwizz4 3 days ago | parent [-] | | > but to claim that some because some artificial intelligence generated the code makes it incomprehensible to humans seems unsupported That's not my claim. My claim is that AI-generated code is misleading to people familiar with human-written code. If you've grown up on AI-generated code, I wouldn't expect you to have this problem, much like how chess newbies don't find impossible board states much harder to process than possible ones. |
|
|
|
|
|
|
|
|
|
| ▲ | JambalayaJimbo 4 days ago | parent | prev | next [-] |
| I’ve been seeing obviously LLM generated PRs, but not huge ones. |
|
| ▲ | bdangubic 4 days ago | parent | prev [-] |
| first time we’d see this there would be a warning, second one is pink slip |