Remix.run Logo
kentm 8 hours ago

Its exactly this. I have had a few LLM coding sessions where I reviewed the resulting work and thought "I don't think my team can safely PR this." I then went back and broke it down into smaller PRs, still using LLMs but at a size that is easy to review. And I reviewed the output myself before I asked a reviewer to commit their time.

The problem is that this is increasingly seen as a non-productive workflow slowing everyone else down, so the pressure is growing for writers to just shove massive PRs out the door and reviewers to use LLMs to make that tractable. I suppose those advocates have more faith in LLM output compared to humans than I do.

greiskul 6 hours ago | parent | next [-]

Thats the thing with giant PRs. They never really needed to be reviewed anyway. In cultures with strong review culture I have worked at, if you send me a thousand line PR and ask me to review it, I will look at the giant blob of text, and immediately fire off a "it's too long, can you cut it into smaller PRs?".

Because I don't trust myself to review a giant PR. It takes too much cognition to properly review it.

And now that people are making PRs with AI, this is even more important. If the AI was good enough to have coded it, please instruct it to make the changes in reviewable chunks.

majormajor 7 hours ago | parent | prev | next [-]

> the pressure is growing for writers to just shove massive PRs out the door and reviewers to use LLMs to make that tractable

Even in these move-fast envs, it should be reasonably apparent for people to realize that the author should be using the LLM to make the PR tractable, not solely using the LLM to shovel out a giant PR + slop PR description.

And the LLMs can often do this - if you ask to restructure or break up a big change differently, they can often make quite reasonable suggestions and help with it. That's just not what you're gonna get if you're lazy. If you want a small LLM-generated change, often you have to start with a big one then ask it to figure out what it can get rid of, since many times it doesn't have perfect model of all the code in it's "head" before it starts spitting stuff out. The big companies have been doing their best to automate this for the last couple of years vs the even-more-blind attempts you used to get, but there's still the issue of the models+tools following generic advice aimed at median codebases vs being intimately familiar with this codebase.

You can go fast without being lazy. And when going fast, in some ways, it's more important than ever to put in that effort to not blowing things up.

kentm 6 hours ago | parent [-]

It should be but often isn't. There's been a lot of threads on HN where the response to huge PRs wasn't "Don't do that, use AI when authoring better" but "The reviewers are actually the problem, they're missing the AI train". And I see this in industry too.

gedy 7 hours ago | parent | prev [-]

> I suppose those advocates have more faith in LLM output compared to humans than I do.

Some of this is the funny situation where the faithful will state: "This writes better code than I do!" and miss the irony of: "yes, yes it does"

themgt 7 hours ago | parent | next [-]

Some of this is the funny situation where the faithful will state: "This writes better code than I do!" and miss the irony of: "yes, yes it does"

"Blessed are the humble ..."

ErroneousBosh 7 hours ago | parent | prev [-]

> "This writes better code than I do!" and miss the irony of: "yes, yes it does"

I guess it depends on what you consider "better". I've tried using LLMs to write code over the past couple of weeks with extremely mixed results.

The LLM certainly writes more interesting code! They like their cute ASCII/unicode animations, don't they?

It definitely writes a lot more code, none of it actually correct but some of it functionally similar to correct code.

If you like lots of code then I guess that's better. I like less code.

kentm 6 hours ago | parent | next [-]

I find it can often write correct code but not maintainable, performant, or reviewable code without additional human guidance. The "solution" frequently given is that humans don't need to maintain it anymore so its not actually a problem. But the agent can't be accountable for mistakes, so unless that changes or the risk of a defect is close to zero, one still has to put forth effort to keep the code maintainable.

To be fair, there are plenty of situations where throwaway code is perfectly fine and/or defect risks are low enough to make the trade-off worth it. I don't think a lot of developers are thinking about it in that context, though.

(No unit tests aren't enough)

TylerE 5 hours ago | parent | prev | next [-]

> They like their cute ASCII/unicode animations, don't they?

One of the few global Cluade directives I have setup is to never use emojis - and it never has, either in chat output or in code. Don't blame the tool when you don't spend 30 seconds configuring it. It's even easier with AI since you don't have to go digging for some obscure .vimrc snippet - it's literally just plain English.

gedy 6 hours ago | parent | prev [-]

Yes I basically meant those folks weren't very good developers to begin with and now extrapolating to: "wow this is better than all devs!", when it's more like "it's you, dude"

ErroneousBosh 6 hours ago | parent [-]

This sounds awfully like the people who think that self-driving cars and even auto-braking systems will eliminate all accidents, because everyone else is as bad a driver as they are.

ryandrake 2 hours ago | parent [-]

Someone pointed out[1] a while ago that LLMs look good at things you are bad at. Which is I think one of the best explanations of why so many people disagree about how good they are at programming. There are a lot of people really bad at programming, and they will look at the output if an LLM and say “Wow, it’s so much better than my code!”

1: https://news.ycombinator.com/item?id=48315309