Remix.run Logo
markbao 6 hours ago

If you save 3 hours building something with agentic engineering and that PR sits in review for the same 30 hours or whatever it would have spent sitting in review if you handwrote it, you’re still saving 3 hours building that thing.

So in that extra time, you can now stack more PRs that still have a 30 hour review time and have more overall throughput (good lord, we better get used to doing more code review)

This doesn’t work if you spend 3 minutes prompting and 27 minutes cleaning up code that would have taken 30 minutes to write anyway, as the article details, but that’s a different failure case imo

josephg 6 hours ago | parent | next [-]

If your team's bottleneck is code review by senior engineers, adding more low quality PRs to the review backlog will not improve your productivity. It'll just overwhelm and annoy everyone who's gotta read that stuff.

Generally if your job is acting as an expensive frontend for senior engineers to interact with claude code, well, speaking as a senior engineer I'd rather just use claude code directly.

eru 6 hours ago | parent [-]

Linting, compiler warnings and automated tests have helped a lot with the grunt work of code review in the past.

We can use AI these days to add another layer.

lelanthran 5 hours ago | parent | prev | next [-]

> So in that extra time, you can now stack more PRs that still have a 30 hour review time and have more overall throughput

Hang on, you think that a queue that drains at a rate of $X/hour can be filled at a rate of 10x$X/hour?

No, it cannot: it doesn't matter how fast you fill a queue if the queue has a constant drain rate, sooner or later you are going to hit the bounds of the queue or the items taken off the queue are too stale to matter.

In this case, filling a queue at a rate of 20 items per hour (every 3 minutes) while it drains at a rate of 1 item every 5 hours means that after a single day, you can expect your last PR to be reviewed in ((8x20) - 1) hours.

IOW, after a single day the time-to-review is 159 hours. Your PRs after the second day is going to take +300 hours.

zmmmmm 5 hours ago | parent | next [-]

This is the fundamental issue currently in my situation with AI code generation.

There are some strategies that help: a lot of the AI directives need to go towards making the code actually easy to review. A lot of it it sits around clarity, granularity (code should be committed primarily in reviewable chunks - units of work that make sense for review) rather than whatever you would have done previously when code production was the bottleneck. Similarly, AI use needs to be weighted not just more towards tests, but towards tests that concretely and clearly answer questions that come up in review (what happens on this boundary condition? or if that variable is null? etc). Finally, changes need to be stratified along lines of risk rather than code modularity or other dimensions. That is, if a change is evidently risk free (in the sense of, "even if this IS broken it doesn't matter) it should be able to be rapidly approved / merged. Only things where it actually matters if it wrong should be blocked.

I have a feeling there are whole areas of software engineering where best practices are just operating on inertia and need to be reformulated now that the underlying cost dynamics have fundamentally shifted.

balamatom 4 hours ago | parent [-]

>Finally, changes need to be stratified along lines of risk rather than code modularity or other dimensions.

Why don't those other dimensions, and especially the code modularity, already reflect the lines of business risk?

Lemme guess, you cargo culted some "best practices" to offload risk awareness, so now your code is organized in "too big to fail" style and matches your vendor's risk profile instead of yours.

zmmmmm 4 hours ago | parent [-]

> Why don't those other dimensions, and especially the code modularity, already reflect the lines of business risk?

I guess the answer (if you're really asking seriously) is that previously when code production cost so far outweighed everything else, it made sense to structure everything to optimise efficiency in that dimension.

So if a change was implemented, the developer would deliver it as a functional unit that might cut across several lines of risk (low risk changes like updating some CSS sitting along side higher risk like a database migration, all bundled together). Because this was what made it fastest for the developer to implement the code.

Now if AI is doing it, screw how easy or fast it is to make the change. Deliver it in review chunks.

Was the original method cargo culted? I think most of what we do is cargo culted regardless. Virtually the entire software industry is built that way. So probably.

balamatom 4 hours ago | parent | prev [-]

You are considering a good-faith environment where GP cares about throughput of the queue.

I think GP is thinking in terms of being incentivized by their environment to demonstrate an image of high personal throughput.

In a dysfunctional organization one is forced to overpromise and underdeliver, which the AI facilitates.

CuriouslyC 6 hours ago | parent | prev [-]

Except that when you have 10 PRs out, it takes longer for people to get to them, so you end up backlogged.

zmmmmm 5 hours ago | parent [-]

And when the PR you never even read because the AI wrote it gets bounced back you with an obscure question 13 days later ..... you're not going to be well positioned to respond to that.