Remix.run Logo
resonious 6 hours ago

This is it. The fact that the PR was vibe coded isn't the problem, and doesn't need to influence the way you handle it.

gdulli 5 hours ago | parent | next [-]

It would be willfully ignorant to pretend that there's not an explosion of a novel and specific kind of stupidity, and to not handle it with due specificity.

5 hours ago | parent | next [-]
[deleted]
WalterSear 4 hours ago | parent | prev | next [-]

I contend that, by far and away the biggest difference between entirely human-generated slop and AI-assisted stupidity is the irrational reaction that some people have to AI-assisted stuff.

JoshTriplett an hour ago | parent | next [-]

Many of the people who submit 9000-line AI-generated PRs today would, for the most part, not have submitted PRs at all before, or would not have made something that passes CI, or would not have built something that looks sufficiently plausible to make people spend time reviewing it.

WalterSear 22 minutes ago | parent [-]

9000-line PRs were never a good idea, have only been sufficiently plausible because we were forced to accept bad PR review practices. Coding was expensive and management beat us into LGTMing them into codebase to keep the features churning.

Those days are gone. Coding is cheap. The same LLMs that enable people to submit 9000 line PRs of chaos can be used to quickly turn them into more sensible work. If they genuinely can't do a better job, rejecting the PR is still the right response. Just push back.

hatefulmoron 4 hours ago | parent | prev | next [-]

Calling things "slop" is just begging the question. The real differentiating factor is that, in the past, "human-generated slop" at least took effort to produce. Perhaps, in the process of producing it, the human notices what's happening and reconsiders (or even better, improves it such that it's no longer "slop".) Claude has no such inhibitions. So, when you look at a big bunch of code that you haven't read yet, are you more or less confident when you find out an LLM wrote it?

WalterSear 3 hours ago | parent | next [-]

I have pretty much the same amount of confidence when I receive AI generated or non-AI generated code to review: my confidence is based on the person guiding the LLM, and their ability to that.

Much more so than before, I'll comfortably reject a PR that is hard to follow, for any reason, including size. IMHO, the biggest change that LLMs have brought to the table is that clean code and refactoring are no longer expensive, and should no longer be bargained for, neglected or given the lip service that they have received throughout most of my career. Test suites and documentation, too.

(Given the nature of working with LLMs, I also suspect that clean, idiomatic code is more important than ever, since LLMs have presumably been trained on that, but this is just a personal superstition, that is probably increasingly false, but also feels harmless)

The only time I think it is appropriate to land a large amount of code at once is if it is a single act of entirely brain dead refactoring, doing nothing new, such as renaming a single variable across an entire codebase, or moving/breaking/consolidating a single module or file. And there better be tests. Otherwise, get an LLM to break things up and make things easier for me to understand, for crying out loud: there are precious few reasons left not to make reviewing PRs as easy as possible.

So, I posit that the emotional reaction from certain audiences is still the largest, most exhausting difference.

grey-area 3 hours ago | parent | next [-]

clean code and refactoring are no longer expensive

Are you contending that LLMs produce clean code?

WalterSear 3 hours ago | parent [-]

They do, for many people. Perhaps you need to change your approach.

dmurray an hour ago | parent [-]

If you can produce a clean design, the LLM can write the code.

fragmede an hour ago | parent [-]

Unless you're doing something fabulously unique (at which point I'm jealous you get to work on such a thing), they're pretty good at cribbing the design of things if it's something that's been well documented online (canonically, a CRUD SaaS app, with minor UI modification to support your chosen niche).

WalterSear 39 minutes ago | parent [-]

And if you are doing something fabulously unique, the LLM can still write all the code around it, likely help with many of the components, give you at least a first pass at tests, and enable rapid, meaningful refactors after each feature PR.

hatefulmoron an hour ago | parent | prev [-]

I don't really understand your point. It reads like you're saying "I like good code, it doesn't matter if it comes from a person or an LLM. If a person is good at using an LLM, it's fine." Sure, but the problem people have with LLMs is their _propensity_ to create slop in comparison to humans. Dismissing other people's observations as purely an emotional reaction just makes it seem like you haven't carefully thought about other people's experiences.

fragmede an hour ago | parent | prev [-]

If you try and one shot it, sure, but if you question Claude, point out the errors of its ways, tell it to refactor and ultrathink, point out that two things have similar functionality and could be merged. It can write unhinged code with duplicate unused variable definitions that don't work, and it'll fix it up if you call it out, or you can just do it yourself. (cue questions of if, in that case, it would just be faster to do it yourself.)

hatefulmoron an hour ago | parent [-]

I have a Claude max subscription. When I think of bad Claude code, I'm not thinking about unused variable definitions. I'm thinking about the times you turn on ultrathink, allow it to access tools and negotiate it's solution, and it still churns out an over complicated yet partially correct solution that breaks. I totally trust Claude to fix linting errors.

WalterSear 37 minutes ago | parent | next [-]

If you are getting garbage out, you are asking it for too much at once. Don't ask for solutions - ask for implementations.

hatefulmoron 23 minutes ago | parent [-]

Distinction without a difference. I'm talking about its output being insufficient, whatever word you want to use for output.

fragmede 26 minutes ago | parent | prev [-]

It's hard to really discuss in the abstract though. Why was the generared code overly complicated? (I mean, I believe you when you say it was, but it doesn't leave much room for discussion). Similarly, what's partially correct about it? How many additional prompts does it take before you a) use it as a starting point b) use it because it works c) don't use any of it, just throw it away d) post about why it was lousy to all of the Internet reachable from your local ASN.

hatefulmoron 6 minutes ago | parent [-]

I've read your questions a few times and I'm a bit perplexed. What kind of answers are you expecting me to give you here? Surely if you use Claude Code or other tools you'd know that the answers are so varying and situation specific it's not really possible for me to give you solid answers.

exe34 2 hours ago | parent | prev [-]

Are you quite sure that's the only difference you can think of? Let me give you a hint: is there any difference in the volume for the same cost at all?

rablackburn 4 hours ago | parent | prev [-]

> It would be willfully ignorant to pretend that there's not an explosion of a novel and specific kind of stupidity

I 100% know what you mean, and largely agree, but you should check out the guidelines, specifically:

> Don't be curmudgeonly. Thoughtful criticism is fine, but please don't be rigidly or generically negative.

And like, the problem _is_ *bad*. A fun, on-going issue at work is trying to coordinate with a QA team who believe chatgpt can write css selectors for HTML elements that are not yet written.

That same QA team deeply care about the spirit of their work, and are motivated by, the _very_ relatable sentiment of, you DONT FUCKING BREAK USER SPACE.

Yeah, in the unbridled, chaotic, raging plasma that is our zeitgeist at the moment, I'm lucky enough to have people dedicating a significant portion of their life to trying to do quality assurance in the idiomatic, industry best-standard way. Blame the FUD, not my team.

I would put to you that the observation that they do not (yet) grok what, for lack of a more specific universally understood term we are calling, "AI" (or LLMs if you are Fancy. But of course none of these labels are quite right). People need time to observe, and learn. And people are busy with /* gestures around vaguely at everything /*.

So yes, we should acknowledge that long-winded trash PRs from AI are a new emergent problem, and yes, if we study the specific problem more closely we will almost certainly find ever more optimal approaches.

Writing off the issue as "stupidity" is mean. In both senses.

cespare 3 hours ago | parent | prev [-]

It is 1995. You get an unsolicited email with a dubious business offer. Upon reflection, you decide it's not worth consideration and delete it. No need to wonder how it was sent to you; that doesn't need to influence the way you handle it.

No. We need spam filters for this stuff. If it isn't obvious to you yet, it will be soon. (Or else you're one of the spammers.)

baq 39 minutes ago | parent [-]

Didn’t even hit the barn, sorry. Codegen tools were obvious, review assistance tools are very lagging, but will come.