> Those are not code problems. They are evaluation problems.

> Code becomes precious when it is the only place knowledge lives.

Reading AI code all day is _agonizing_. Just, a horrible way to live, and it melts people's brains at the moment you need them to be the most capable.

Manual programming has this really productive and gratifying feedback loop, where you read the code, write the code, and fix it until it compiles/runs/does what you want. AI code not only does half that for you, but it makes the "click" at the end uninspiring because you're never sure if it's cheated a bit to get to that moment.

Trying to operate with AI-generated code as the only durable artifact of programming is a dead end for the industry. Charity points to (and correct discards) architecture diagrams/specs as an interesting space to work in. My suspicion is that it's closer to the thing that's hand-written: prompts, markdown plans, and other nudges. Focus on the thing that you, as a human, produce, and that's the basis for both the core loop of "did the AI follow my instructions" and it's higher-leverage when you go to code review.

By the time you get to the PR, you've probably typed enough to Claude that you can regenerate the code, but the current industry default is to just throw away all those sessions and ship the code. That's backwards!

▲

agumonkey a minute ago | parent | next [-]

the act, eval, adjust loop is probably neurologically important.. reading about things you didn't dive into is really a dread

depending on your industry, you might be able to ship half-slop and then fix some bugs downstream though

▲

gavinh 6 minutes ago | parent | prev | next [-]

I agree that reading AI code all day is agonizing. We're relying on code review to develop parts of our mental model of the system that were previously developed through coding. We're having more difficulty comprehending and recall details of the system. This is probably unsurprising; people recall information better that they "generated" than information they read. I am applying some lessons from pedagogy to extend code review. If this resonates with you, I would like to talk.

▲

philbo 2 hours ago | parent | prev | next [-]

If a coworker dumped a 5k-line code review on you, you'd tell them to come back when it's broken down into smaller, reviewable chunks. Large dumps of code are basically unreviewable by humans, but it seems like a lot of people have forgotten about that when it comes to LLMs.

▲

roncesvalles 36 minutes ago | parent | next [-]

You aren't allowed to block PRs for being too large anymore. The objective that every engineer should be 2x/3x/5x more productive can only be achieved if you go totally lax on code reviews.

Because if all your SWEs produce 5x more code, it also means they have to review 5x more code. But LLMs don't really help with code reviews. Then it becomes a Metcalfian paradox unless you just rubberstamp PRs, which is what is expected of you.

▲

trjordan 2 hours ago | parent | prev | next [-]

I think it's worse than that. At least if I dumped 5k LoC on somebody in 2021, you knew I spent the time to write it, so it's "fair" to ask you to read it. But I didn't write it in 2026, so you shouldn't read it.

I think it's less about "break it down" and more about "let's communicate at the same altitude."

I wrote a (bait-titled) post about it: https://tern.sh/blog/stop-reading-prs/

▲

fusslo an hour ago | parent [-]

113 files +22913 −2423

305 files +15075 −13110

153 files +21934 −8698

125 files +28120 −2398

43 files +11188 −63

118 files +21564 −647

These are the largest (6 of 35) in the past 30 days. added: 190079 removed: 39696 in the last 6 months

from one person.

	▲	evdubs 20 minutes ago \| parent [-]
		I hope 99% of that was documentation and testing.

▲

darth_aardvark 2 hours ago | parent | prev | next [-]

Breaking up a giant PR can be a tedious, time-consuming hassle, and in the past I could sympathize in practice if someone had a giant PR they didn't have time to decompose once they got it working.

But it's also the exact sort of thing that LLMs are literally perfect for in my experience so there's really no excuse anymore. I've never seen Claude fail to turn a 5k PR into a well-decomposed Graphite stack.

▲

win311fwg 2 hours ago | parent | prev | next [-]

It is not so much forgetting as much as it is acceptance that when welcoming AI into a codebase, the code can no longer matter; that all that matters is that the properties of the system are validated. That isn't a change that comes free, so nobody should be expecting magic, it is a different set of tradeoffs. There is no such thing as a panacea.

▲

hootz 2 hours ago | parent | prev | next [-]

I think they expect you to also use an LLM to review, and I bet they are doing exactly that when asked to review someone else's code.

	▲	acedTrex a few seconds ago \| parent \| next [-]
		Theres really no diff between a rubber stamp and an llm review, they both do the same thing.
	▲	latentsea an hour ago \| parent \| prev [-]
		That gets you 90% the way there. So, it it only really works if you accept the cruft and the risks associated with that last 10%. Been doing this day in a day out for the last few months and no matter how much and how good we get the automated reviews, we still can't skip the manual ones.

▲

cmrdporcupine 2 hours ago | parent | prev [-]

> If a coworker dumped a 5k-line code review on you, you'd tell them to come back when it's broken down into smaller, reviewable chunks.

I would, and all my training at Google told me to do that. But what I found after I left that comfortable box was that somehow this kind of practice is acceptable in the industry at large and you're expected to just Deal With It(tm). 5k lines isn't even high by what I've seen.

Worse the "code review" tools that people have access to in GitHub make this absolutely and totally unworkable to incrementally improve review. Messy merge commits full of "responding to code review" comments. Threads impossible to follow. Just bad tooling.

So a lot of shops, from what I've seen, are just yeeting it with very shallow reviews.

This is my observation pre agentic AI. LLMs just threw kerosene on that dumpster fire.

▲

mooreds 3 hours ago | parent | prev | next [-]

Are there any products out there that are capturing the prompts/sessions? I imagine you could do it in an adhoc way, asking Claude to write up a summary of the session as part of the commit message. But is there anything else that's more structured/higher level?

	▲	sdesol 27 minutes ago \| parent \| next [-]
		I am working on solving the AI Code Provenance problem and I believe my repos may be the first that provides AI code provenance. See the following example: https://github.com/gitsense/gsc-cli/blob/main/internal/cli/r... Notice how the code block header attributes the model. The UUID can be traced to the conversation so everybody can tell exactly how the code came about. For this to work though, you need to use my chat app as it ensures you can't tamper with things if you are truly serious about AI code provenance. I also have a lot more human-focused method which is part of my CLI tool. https://github.com/gitsense/gsc-cli I am currently looking at making pi (https://github.com/earendil-works/pi) support AI code provenance, but for now if you want a more structured way to capture what you have done in an agent session that can be used in code reviews and be carried forward as knowledge that lives inside your repository, I have gsc lessons The basic idea is, after you have finished chatting/working with the agent, you would work with it to identify lessons worth carrying forward. You can store your session if you want, but really, the lessons should be something that can help you review code better and to prevent future mistakes. I have a real working example at https://github.com/gitsense/smart-ripgrep This is a fork of the BurntSushi/ripgrep repository. It shows how you can use lessons to learn from past design decisions.
	▲	trjordan 2 hours ago \| parent \| prev \| next [-]
		We're working on it, thought it's all early. I'd love feedback: https://tern.sh First product compares the code to the prompts and highlights places the agent made decisions you weren't involved in: https://tern.sh/docs/tours/
	▲	latentsea an hour ago \| parent \| prev [-]
		We just have hook that runs on git push that instructs Claude to ensure the PR description is up to date. Works well enough for us.

▲

keybored 26 minutes ago | parent | prev [-]

Flintstone Engineering is applying Space Age synthetic intelligence (in a metaphorical sense) technology with code generation. Babysitting, version controlling, etc. generated code should be a thing of the past. But that is what GenAI is.

At the very least apply it at a higher level: specification, proofs, anything but generating Rust/Java/C and then letting yourself or an agent babysit it.