Remix.run Logo
danpalmer 13 hours ago

I recently had a quandary at work. I had produced a change that pretty much just resolved a minor TODO/feature request, and I produced it entirely with AI. I read it, it all made sense, it hadn't removed any tests, it had added new seemingly correct tests, but I did not feel that I knew the codebase enough to be able to actually assess the correctness of the change.

I want to do good engineering, not produce slop, but for 1 min of prompting, 5 mins of tidying, and 30 mins of review, we might save 2 days of eng time. That has to be worth something.

I could see a few ways forward:

- Drop it, submit a feature request instead, include the diff as optional inspiration.

- Send it, but be clear that it came from AI, I don't know if it works, and ask the reviewers to pay special attention to it because of that...

- Or Send it as normal, because it passes tests/linters, and review should be the same regardless of author or provenance.

I posted this to a few chat groups and got quite a range of opinions, including varying approach by how much I like the maintainer. Strong opinions for (1), weak preferences for (2), and a few advocating for (3).

Interestingly, the pro-AI folks almost universally doubled down and said that I should use AI more to gain more confidence – ask how can I test it, how can we verify it, etc – to move my confidence instead of changing how review works.

I thought that was an interesting idea that I hadn't pushed enough, so I spent a further hour or so prompting around ways to gain confidence, throughout which the AI "fixed" so many things to "improve" the code that I completely lost all confidence in the change because there were clearly things that were needed and things that weren't, and disentangling them was going to be way more work than starting from scratch. So I went with option 1, and didn't include a diff.

Balinares 11 hours ago | parent | next [-]

Aside from anything else, you have good engineering instincts, and I wish more people in the industry were like you.

danpalmer 11 hours ago | parent [-]

Thanks, doing my best. It's one of the reasons I want to get more of my AI-skeptical colleagues onboard with AI development. They're skeptical for good reasons, but right now so much progress is being driven by those who lack skills, taste, or experience. I understand those with lots of experience being skeptical at the claims, I like to think I am too, but I think there's clearly something here, and I want more people who are skeptical to shape the direction and future of these technologies.

ithkuil 10 hours ago | parent [-]

Being a skeptic doesn't make one an irrational hater (surely such people exist and might be noisy and taint all skeptics as such)

I am learning how to make good use of agent assisted engineering and while I'm positively impressed with many things they can do, I'm definitely skeptical about various aspects of the process:

1. Quality of the results 2. Maintainability 3. Overall saved time

There are still open problems because we're introducing a significant change in the tooling while keeping the rest of the process unchanged (often for good reasons). For example consider the imbalance in the code review cost (some people produce tons of changes and the rest of the team is drowned by the review burden)

This new wave of tooling is undoubtedly going to transform the way that software is developed, but I think jump too quickly to the conclusion that they already figured out how exactly is that going to look like

p0w3n3d 5 hours ago | parent [-]

I'd say that the worst thing that can happen to a developer using Claude etc is detachment from the code.

At some point of time the code starts to be "not yours", you don't recognise it anymore. You don't have the connection to it. It's like your everyday working in another company...

strogonoff 8 hours ago | parent | prev | next [-]

Here’s what you could do if you somehow found yourself with an LLM-generated change to a codebase implementing a feature you want, and you wanted to do the most do expedite the implementation of that feature without disrespecting and alienating maintainers:

1. Go through all changes, understand what changed and how it solves the problem.

2. Armed with that understanding, write (by hand) a high-level summary of what can be done (and why) to implement your feature.

3. Write a regular feature request, and include that summary in it (as an appendix).

Not long ago I found myself on the receiving end of a couple of LLM-generated PRs and partly LLM-generated issue descriptions with purported solutions. Both were a bit of a waste of time.

The worst about the PRs is when you cannot engage in a good-faith, succint and quick “why” sort of discussion with the submitter as you are going through changes. Also, when PR fails to notice a large-scale pre-existing pattern I would want to follow to reduce mental overhead and instead writes something completely new, I have to discard it.

For issues and feature requests, there was some “investigation” submitter thought would be helpful to me. It ended up a bit misleading, and at the same time I noticed that people may want to spend the same total amount of effort on writing it up, except so now part of that effort goes towards their interaction with some LLM. So, I asked to just focus on describing the issue from their human perspective—if they feel like they have extra time and energy, they should put more into that instead.

If it happens at work, I obviously still get paid to handle this, but I would have to deprioritise submissions from people who ignore my requests.

zozbot234 8 hours ago | parent [-]

> Go through all changes, understand what changed and how it solves the problem.

GP has said that they can't do this, since they're unfamiliar with the language and that specific part of the codebase. Their best bet AIUI is (1) ask the AI agent to reverse engineer the diff into a high-level plan that they are qualified to evaluate and revise, if feasible, so that they can take ownership of it and make it part of the feature request, and (2) attach the AI-generated code diff to the feature req as a mere convenience, labeling it very clearly as completely unrevised AI slop that simply appears to address the problem.

strogonoff 3 hours ago | parent [-]

Not being familiar with a part of a codebase is not an incurable condition.

If the conclusion you made of that there is no workaround, then let that be the entire point. The alternatives are to get over yourself and ask people to implement a feature, or to understand how to help and then help.

The former is what OP did; the latter I described what I see as an efficient way of achieving while making use of an LLM-produced PR.

vova_hn2 8 hours ago | parent | prev | next [-]

> I did not feel that I knew the codebase enough to be able to actually assess the correctness of the change.

> I want to do good engineering, not produce slop, but for 1 min of prompting, 5 mins of tidying, and 30 mins of review, we might save 2 days of eng time.

I don't really understand where do "2 days of engineering time" come from.

What exactly would prevent someone who does know the codebase do "1 min of prompting, 5 mins of tidying, and 30 mins of review" but then actually understand if changes make sense or not?

More general question: why do so many slopposters act like they are the only ones who have access to a genAI tool? Trust me, I also have access to all this stuff, so if I wanted to read a bunch of LLM-slop I could easily go and prompt it myself, there is no need to send it to me.

Related link: https://claytonwramsey.com/blog/prompt/ (hn discussion: https://news.ycombinator.com/item?id=43888803 )

darkwater 9 hours ago | parent | prev | next [-]

> Interestingly, the pro-AI folks almost universally doubled down and said that I should use AI more to gain more confidence – ask how can I test it, how can we verify it, etc – to move my confidence instead of changing how review works.

I think this is a good suggestion, and it's what I usually do. If - at work - Claude generated something I'm not fully understanding already, and if what has generated works as expected when experimentally tested, I ask it "why did you put this? what is this construct for? how you will this handle this edge case?" and specifically tell it to not modify anything, just answer the question. This way I can process its output "at human speed" and actually make it mine.

pduggishetti 13 hours ago | parent | prev | next [-]

Do you use the library? if yes, test it in prod or even staging with your patch, then submit the review

danpalmer 11 hours ago | parent [-]

Unfortunately not possible in this case for technical reasons, not a library in the traditional sense, significant work to fork, etc. This is in the Google monorepo.

lawn 12 hours ago | parent | prev | next [-]

> but I did not feel that I knew the codebase enough to be able to actually assess the correctness of the change.

The good engineering approach is to verify that the change is correct. More prompts for the AI does nothing, instead play with the code, try to break it, write more tests yourself.

danpalmer 11 hours ago | parent [-]

I exhausted my ability to do this (without AI). It was a codebase I don't know, in a language I don't know, solving a problem that I have a very limited viewpoint of.

These are all reasons why pre-AI I'd never have bothered to even try this, it wouldn't be worth my time.

If you think this is therefore "bad engineering", maybe that's true! As I said, I ended up discarding the change because I wasn't happy with it.

gwbas1c 4 hours ago | parent [-]

> I exhausted my ability to do this (without AI). It was a codebase I don't know, in a language I don't know, solving a problem that I have a very limited viewpoint of.

And that's the critical point! I think it's fine to send the diff in; and clearly mark it as AI / vibe-coded. (Along with your prompts.)

zephyruslives 7 hours ago | parent | prev | next [-]

>I thought that was an interesting idea that I hadn't pushed enough, so I spent a further hour or so prompting around ways to gain confidence, throughout which the AI "fixed" so many things to "improve" the code that I completely lost all confidence in the change because there were clearly things that were needed and things that weren't, and disentangling them was going to be way more work than starting from scratch.

I feel this so much. In my opinion, all of the debate around accepting AI generated stuff can be boiled down to one attribute, which is effort. Personally, I really dislike AI generated videos and blogs for example, and will actively avoid them because I believe I "deserve more effort".

similarly for AI generated PRs, I roll my eyes when I see an AI PR, and I'm quicker to dismiss it as opposed to a human written one. In my opinion, if the maintainers cannot hold the human accountable for the AI generated code, then it shouldn't be accepted. This involves asking questions, and expecting the human to respond.

I don't know if we should gatekeep based on effort or not. Obviously the downside is, you reduce the "features shipped" metric a lot if you expect the human to put in the same amount of effort, or a comparable amount of effort as they would've done otherwise. Despite the downside, I'm still pro gatekeeping based on effort (It doesn't help that most of the people trying to convince otherwise are using the very same low effort methods that they're trying to convince us to accept). But, as in most things, one must keep an open mind.

PunchyHamster 9 hours ago | parent | prev [-]

To be entirely fair "sorta working, solving a problem but not really all that great for the rest of the codebase" PRs are human thing too.

The problem is AI generating it en masse, and frankly most people put far less effort that even your first paragraph and blindly push stuff they have not even read let alone understood

> Interestingly, the pro-AI folks almost universally doubled down and said that I should use AI more to gain more confidence – ask how can I test it, how can we verify it, etc – to move my confidence instead of changing how review works.

Well, it's not terrible at just getting your bearings in the codebase, the most productive use I got out of it is treating it as "turbo grep" to look around existing codebases and figure out things