The problem I’ve been having is that when Claude generates copious amounts of code, it makes it way harder to review than small snippets one at a time.

Some would argue there’s no point reviewing the code, just test the implementation and if it works, it works.

I still am kind of nervous doing this in critical projects.

Anyone just YOLO code for projects that’s not meant to be one time, but fully intend to have to be supported for a long time? What are learnings after 3-6 months of supporting in production?

▲ serial_dev 8 hours ago | parent | next [-]

In a professional setting where you still have coding standards, and people will review your code, and the code actually reaches hundreds of thousands of real users, handling one agent at a time is plenty for me. The code output is never good enough, and it makes up stuff even for moderately complicated debugging ("Oh I can clearly see the issue now", I heard it ten times before and you were always wrong!)

I do use them, though, it helps me, search, understand, narrow down and ideate, it's still a better Google, and the experience is getting better every quarter, but people letting tens or hundreds of agents just rip... I can't imagine doing it.

For personal throwaway projects that you do because you want to reach the end output (as opposed to learning or caring), sure, do it, you verify it works roughly, and be done with it.

▲

pron 6 hours ago | parent | next [-]

This is my problem with the whole "can LLMs code?" discussion. Obviously, LLMs can produce code, well even, much like a champion golfer can get a hole in one. But can they code in the sense of "the pilot can fly the plane", i.e. barring a catastrophic mechanical malfunction or a once-in-a-decade weather phenomennon, the pilot will get the plane to its destination safely? I don't think so.

To me, someone who can code means someone who (unless they're in a detectable state of drunkenness, fatigue, illness, or distraction) will successfully complete a coding task commensurate with some level of experience or, at the very least, explain why exactly the task is proving difficult. While I've seen coding agents do things that truly amaze me, they also make mistakes that no one who "can code" ever makes. If you can't trust an LLM to complete a task anyone who can code will either complete or explain their failure, then it can't code, even if it can (in the sense of "a flipped coin can come up heads") sometimes emit impressive code.

▲

lupire an hour ago | parent [-]

That's a funny analogy. You should look into how modern planes are flown. Hint: it's a computer.

	▲	pron an hour ago \| parent [-]
		> Hint: it's a computer. Not quite, but in any event none of the avionics is an LLM or a program generated by one.

▲

prmoustache 6 hours ago | parent | prev | next [-]

> people will review your code,

People will ask LLM to review some slop made by LLM and they will be absolutely right!

There is no limit to lazyness.

	▲	flemhans 3 hours ago \| parent [-]
		Soon you'll be seen as irresponsible and wasteful if you don't let the smarter LLM do it.

▲

KaiserPro 7 hours ago | parent | prev [-]

> people will review your code,

I mean you'd think. But it depends on the motivations.

At meta, we had league tables for reviewing code. Even then people only really looked at it if a) they were a nitpicking shit b) don't like you and wanted piss on your chips c) its another team trying to fix our shit.

With the internal claude rollout and the drive to vibe code all the things, I'm not sure that situation has got any better. Fortunately its not my problem anymore

	▲	serial_dev 6 hours ago \| parent [-]
		Well, it certainly depends on the culture of the team and organization. Where you have shared ownership, meaning once I approved your PR, I am just as responsible of something goes wrong as you are and I can be expected to understand it just as well as you do… your code will get reviewed. If shipping is the number one priority of the team, and a team is really just a group of individuals working to meet their quota, and everyone wants to simply ship their stuff, managers pressure managers to constantly put pressure on the devs, you’ll get your PR rubber stamped after 20s of review. Why would I spend hours trying to understand what you did if I could work on my stuff. And yes, these tools make this 100x worse, people don’t understand their fixes, code standards are no longer relevant, and you are expected to ship 10x faster, so it’s all just slop from here on.

▲ gen220 8 hours ago | parent | prev | next [-]

In my (admittedly conflict-of-interest, I work for graphite/cursor) opinion, asking CC to stack changes, and then having an automated reviewer agent help a lot with digesting and building conviction in otherwise-large changesets.

My "first pass" of review is usually me reading the PR stack in graphite. I might iterate on the stack a few times with CC before publishing it for review. I have agents generate much of my code, but this workflow has allowed me to retain ownership/understanding of the systems I'm shipping.

▲ AstroBen 8 hours ago | parent | prev | next [-]

I think we'll start to see the results of that late this year, but it's a little early yet. Plenty of people are diving headfirst into it

To me it feels like building your project on sand. Not a good idea unless it's a sandcastle

▲ linsomniac 7 hours ago | parent | prev | next [-]

I have Claude Code author changes, and then I use this "codex-review" skill I wrote that does a review of the last commit. You might try asking Codex (or whatever) to review the change to give you some pointers to focus on with your review, and also in your review you can see if Codex was on track or if it missed anything, maybe feed that back into your codex review prompt.

▲ 8 hours ago | parent | prev | next [-]

[deleted]

▲ idontwantthis 8 hours ago | parent | prev | next [-]

I just can’t get with this. There is so much beyond “works” in software. There are requirements that you didn’t know about and breaking scenarios that you didn’t plan for and if you don’t know how the code works, you’re not going to be able to fix it. Assuming an AI could fix any problem given a good enough prompt, I can’t write that prompt without sufficient knowledge and experience in the codebase. I’m not saying they are useless, but I cannot just prompt, test and ship a multiservice, asynchronous, multidb, zero downtime app.

▲ zmmmmm 3 hours ago | parent | next [-]

Yes this is one of my concerns.

Usually about 50% of my understanding of the domain comes from the process of building the code. I can see a scenario where large scale automated code works for a while but then quickly becomes unsupportable because the domain expertise isn't there to drive it. People are currently working off their pre-existing domain knowledge which is what allows them to rapidly and accurately express in a few sentences what an AI should do and then give decisive feedback to it.

The best counter argument is that AIs can explain the existing code and domain almost as well as they can code it to begin with. So there is a reasonable prospect that the whole system can sustain itself. However there is no arguing to me that isn't a huge experiment. Any company that is producing enormous amounts of code that nobody understands is well out over their skis and could easily find themselves a year or two down the track with huge issues.

▲ atonse 7 hours ago | parent | prev [-]

I don’t know what your stack is, but at least with elixir and especially typescript/nextJS projects, and properly documenting all those pieces you mentioned, it goes a long way. You’d be amazed.

▲ idontwantthis 5 hours ago | parent | next [-]

If it involves Nextjs then we aren’t talking about the same category of software. Yes it can make a website pretty darn well. Can it debug and fix excessive database connection creation in a way that won’t make things worse? Maybe, but more often not and that’s why we are engineers and not craftsmen.

That example is from a recent bug I fixed without Cursor being able to help. It wanted to create a wrapper around the pool class that would have blocked all threads until a connection was free. Bug fixed! App broken!

▲ mrtesthah 6 hours ago | parent | prev [-]

I would never use, let alone pay for, a fully vibe-coded app whose implementation no human understands.

Whether you’re reading a book or using an app, you’re communicating with the author by way of your shared humanity in how they anticipate what you’re thinking as you explore the work. The author incorporates and plans for those predicted reactions and thoughts where it makes sense. Ultimately the author is conveying an implicit mental model to the reader.

The first problem is that many of these pathways and edge cases aren’t apparent until the actual implementation, and sometimes in the process the author realizes that the overall app would work better if it were re-specified from the start. This opportunity is lost without a hands on approach.

The second problem is that, the less human touch is there, the less consistent the mental model conveyed to the user is going to be, because a specification and collection of prompts does not constitute a mental model. This can create subconscious confusion and cognitive friction when interacting with the work.

	▲	andrekandre 40 minutes ago \| parent [-]
		`> The second problem is that, the less human touch is there, the less consistent the mental model conveyed to the user is going to be, because a specification and collection of prompts does not constitute a mental model. This can create subconscious confusion and cognitive friction when interacting with the work.` tbf, this is a trend i see more and more across the industry; llm or not so many process get automated that teams just implement x cause pdm y said so and its because they need to meet goal z for the quarter... and everyone is on scrum autopilot they cant see the forest for the trees anymore. i feel like the massive automation afforded by these coding agents may make this worse

▲ chasing 5 hours ago | parent | prev | next [-]

Yeah, it's not just my job to generate the code: It's my job to know the code. I can't let code out into the wild that I'm not 100% willing to vouch for.

▲

zmmmmm 3 hours ago | parent [-]

At a higher level, it goes beyond that. It's my job to take responsibility for code. At some fundamental level that puts a limit on how productive AI can be. Because we can only produce code as fast as responsibility takers can execute whatever processes they need to do to ensure sufficient due diligence is executed. In a lot of jurisdictions, human-in-loop line by line review is being mandated for code developed in regulatory settings. That pretty much caps the output at the rate of human review, which is to be honest, not drastically higher than coding itself anyway (Often I might invest 30% of the time to review a change as the developer took to do it).

It means there is no value in producing more code. Only value in producing better, clearer, safer code that can be reasoned about by humans. Which in turn makes me very sceptical about agents other than as a useful parallelisation mechanism akin to multiple developers working on separate features. But in terms of ramping up the level of automation - it's frankly kind of boring to me because if anything it make the review part harder which actually slows us down.

	▲	yeasku 3 hours ago \| parent [-]
		[dead]

▲ szundi 8 hours ago | parent | prev [-]

[dead]