Remix.run Logo
gdulli 9 hours ago

It would be willfully ignorant to pretend that there's not an explosion of a novel and specific kind of stupidity, and to not handle it with due specificity.

rablackburn 9 hours ago | parent | next [-]

> It would be willfully ignorant to pretend that there's not an explosion of a novel and specific kind of stupidity

I 100% know what you mean, and largely agree, but you should check out the guidelines, specifically:

> Don't be curmudgeonly. Thoughtful criticism is fine, but please don't be rigidly or generically negative.

And like, the problem _is_ *bad*. A fun, on-going issue at work is trying to coordinate with a QA team who believe chatgpt can write css selectors for HTML elements that are not yet written.

That same QA team deeply care about the spirit of their work, and are motivated by, the _very_ relatable sentiment of, you DONT FUCKING BREAK USER SPACE.

Yeah, in the unbridled, chaotic, raging plasma that is our zeitgeist at the moment, I'm lucky enough to have people dedicating a significant portion of their life to trying to do quality assurance in the idiomatic, industry best-standard way. Blame the FUD, not my team.

I would put to you that the observation that they do not (yet) grok what, for lack of a more specific universally understood term we are calling, "AI" (or LLMs if you are Fancy. But of course none of these labels are quite right). People need time to observe, and learn. And people are busy with /* gestures around vaguely at everything /*.

So yes, we should acknowledge that long-winded trash PRs from AI are a new emergent problem, and yes, if we study the specific problem more closely we will almost certainly find ever more optimal approaches.

Writing off the issue as "stupidity" is mean. In both senses.

watwut 43 minutes ago | parent [-]

I do not think that is being curmudgeonly. Instead, OP is absolutely right.

We collectively used the strategy of "we pretend we are naively stupid and dont talk directly about issues" in multiple areas ... and it failed every single time in all of them. It never solves the problem, it just invites to bad/lazy/whatever actors to play semantic manipulative games.

9 hours ago | parent | prev | next [-]
[deleted]
WalterSear 9 hours ago | parent | prev [-]

I contend that, by far and away the biggest difference between entirely human-generated slop and AI-assisted stupidity is the irrational reaction that some people have to AI-assisted stuff.

hatefulmoron 8 hours ago | parent | next [-]

Calling things "slop" is just begging the question. The real differentiating factor is that, in the past, "human-generated slop" at least took effort to produce. Perhaps, in the process of producing it, the human notices what's happening and reconsiders (or even better, improves it such that it's no longer "slop".) Claude has no such inhibitions. So, when you look at a big bunch of code that you haven't read yet, are you more or less confident when you find out an LLM wrote it?

WalterSear 8 hours ago | parent | next [-]

I have pretty much the same amount of confidence when I receive AI generated or non-AI generated code to review: my confidence is based on the person guiding the LLM, and their ability to that.

Much more so than before, I'll comfortably reject a PR that is hard to follow, for any reason, including size. IMHO, the biggest change that LLMs have brought to the table is that clean code and refactoring are no longer expensive, and should no longer be bargained for, neglected or given the lip service that they have received throughout most of my career. Test suites and documentation, too.

(Given the nature of working with LLMs, I also suspect that clean, idiomatic code is more important than ever, since LLMs have presumably been trained on that, but this is just a personal superstition, that is probably increasingly false, but also feels harmless)

The only time I think it is appropriate to land a large amount of code at once is if it is a single act of entirely brain dead refactoring, doing nothing new, such as renaming a single variable across an entire codebase, or moving/breaking/consolidating a single module or file. And there better be tests. Otherwise, get an LLM to break things up and make things easier for me to understand, for crying out loud: there are precious few reasons left not to make reviewing PRs as easy as possible.

So, I posit that the emotional reaction from certain audiences is still the largest, most exhausting difference.

grey-area 7 hours ago | parent | next [-]

clean code and refactoring are no longer expensive

Are you contending that LLMs produce clean code?

WalterSear 7 hours ago | parent [-]

They do, for many people. Perhaps you need to change your approach.

dmurray 6 hours ago | parent [-]

If you can produce a clean design, the LLM can write the code.

WalterSear 5 hours ago | parent | next [-]

I think maybe there's another step too - breaking the design up into small enough peices that the LLM can follow it, and you can understand the output.

TexanFeller 2 hours ago | parent [-]

So do all the hard work yourself and let the AI do some of the typing, that you’ll have to spend extra time reviewing closely in case its RNG factor made it change an important detail. And with all the extra up front design, planning, instructions, and context you need to provide to the LLM I’m not sure I’m saving on typing. A lot of people recommend going meta and having LLMs generate a good prompt and sequence of steps to follow, but I’ve only seen that kinda sorta work for the most trivial tasks.

fragmede 6 hours ago | parent | prev [-]

Unless you're doing something fabulously unique (at which point I'm jealous you get to work on such a thing), they're pretty good at cribbing the design of things if it's something that's been well documented online (canonically, a CRUD SaaS app, with minor UI modification to support your chosen niche).

WalterSear 5 hours ago | parent [-]

And if you are doing something fabulously unique, the LLM can still write all the code around it, likely help with many of the components, give you at least a first pass at tests, and enable rapid, meaningful refactors after each feature PR.

hatefulmoron 6 hours ago | parent | prev [-]

I don't really understand your point. It reads like you're saying "I like good code, it doesn't matter if it comes from a person or an LLM. If a person is good at using an LLM, it's fine." Sure, but the problem people have with LLMs is their _propensity_ to create slop in comparison to humans. Dismissing other people's observations as purely an emotional reaction just makes it seem like you haven't carefully thought about other people's experiences.

WalterSear 5 hours ago | parent [-]

My point is that, if I can do it right, others can too. If someone's LLM is outputing slop, they are obviously doing something different: I'm using the same LLMs.

All the LLM hate here isn't observation, it's sour grapes. Complaining about slop and poor code quality outputs is confessing that you haven't taken the time to understand what is reasonable to ask for, aren't educating your junior engineers how to interact with LLMs.

lukan an hour ago | parent [-]

"My point is that, if I can do it right, others can too."

Can it also be, that different people work in different areas and LLM's are not equally good in all areas?

fragmede 6 hours ago | parent | prev [-]

If you try and one shot it, sure, but if you question Claude, point out the errors of its ways, tell it to refactor and ultrathink, point out that two things have similar functionality and could be merged. It can write unhinged code with duplicate unused variable definitions that don't work, and it'll fix it up if you call it out, or you can just do it yourself. (cue questions of if, in that case, it would just be faster to do it yourself.)

hatefulmoron 6 hours ago | parent [-]

I have a Claude max subscription. When I think of bad Claude code, I'm not thinking about unused variable definitions. I'm thinking about the times you turn on ultrathink, allow it to access tools and negotiate it's solution, and it still churns out an over complicated yet partially correct solution that breaks. I totally trust Claude to fix linting errors.

WalterSear 5 hours ago | parent | next [-]

If you are getting garbage out, you are asking it for too much at once. Don't ask for solutions - ask for implementations.

hatefulmoron 5 hours ago | parent [-]

Distinction without a difference. I'm talking about its output being insufficient, whatever word you want to use for output.

WalterSear 4 hours ago | parent [-]

And I'm arguing that if the output wasn't sufficient, neither was your input.

You could also be asking for too much in one go, though that's becoming less and less of a problem as LLMs improve.

hatefulmoron 4 hours ago | parent [-]

You're proposing a truism: if you don't get a good result, it's either because your query is bad or because the LLM isn't good enough to provide a good result.

Yes, that is how this works. I'm talking about the case where you're providing a good query and getting poor results. Claiming that this can be solved by more LLM conversations and ultrathink is cope.

WalterSear 3 hours ago | parent [-]

I've claimed neither. I actually prefer restarting or rolling back quickly rather than trying to re-work suboptimal outputs - less chance of being rabbit holed. Just add what I've learned to the original ticket/prompt.

'Git gud' isn't much of a truism.

fragmede 5 hours ago | parent | prev [-]

It's hard to really discuss in the abstract though. Why was the generared code overly complicated? (I mean, I believe you when you say it was, but it doesn't leave much room for discussion). Similarly, what's partially correct about it? How many additional prompts does it take before you a) use it as a starting point b) use it because it works c) don't use any of it, just throw it away d) post about why it was lousy to all of the Internet reachable from your local ASN.

hatefulmoron 5 hours ago | parent [-]

I've read your questions a few times and I'm a bit perplexed. What kind of answers are you expecting me to give you here? Surely if you use Claude Code or other tools you'd know that the answers are so varying and situation specific it's not really possible for me to give you solid answers.

JoshTriplett 6 hours ago | parent | prev | next [-]

Many of the people who submit 9000-line AI-generated PRs today would, for the most part, not have submitted PRs at all before, or would not have made something that passes CI, or would not have built something that looks sufficiently plausible to make people spend time reviewing it.

WalterSear 5 hours ago | parent [-]

9000-line PRs were never a good idea, have only been sufficiently plausible because we were forced to accept bad PR review practices. Coding was expensive and management beat us into LGTMing them into the codebase to keep the features churning.

Those days are gone. Coding is cheap. The same LLMs that enable people to submit 9000 line PRs of chaos can be used to quickly turn them into more sensible work. If they genuinely can't do a better job, rejecting the PR is still the right response. Just push back.

exe34 7 hours ago | parent | prev [-]

Are you quite sure that's the only difference you can think of? Let me give you a hint: is there any difference in the volume for the same cost at all?