new | show | ask | jobs Github

jnovek 3 hours ago

> Able to review the code output of coding agents

That probably won’t be necessary in a few years.

▲

circlefavshape 3 hours ago | parent | next [-]

It's necessary for devs right now, no matter how good they are, and it's those devs' code the models are trained on

▲

prewett 2 hours ago | parent [-]

Even worse, the training set probably includes a lot of code that needed review but didn't get it...

	▲	keeda 34 minutes ago \| parent [-]
		If we know the outcome of that code, such as whether it caused bugs or data corruption or a crappy UX or tech debt -- which is potentially available in subsequent PR commit messages -- it's still valuable training data. Probably even more valuable than code that just worked, because evidently we have enough of that and AI code still has issues.

▲

rafterydj 3 hours ago | parent | prev | next [-]

I see this line of thought put out there many times, and I've been thinking: why do people do anything at all? What's the point? If no one at all is even reviewing the output of coding agents, genuinely, what are we doing as a society?

I fail to see how we transition society into a positive future without supplying means of verifying systemic integrity. There is a reason that Upton Sinclair became famous: wayward incentives behind closed doors generally cause subpar standards, which cause subpar results. If the FDA didn't exist, or they didn't "review the output", society would be materially worse off. If the whole pitch for AI ends with "and no one will even need to check anything" I find that highly convenient for the AI industry.

▲

ndriscoll 3 hours ago | parent [-]

You could e.g. write specs and only review high level types plus have deterministic validation that no type escapes/"unsafe" hatches were used, or instruct another agent to create adversarial blackbox attempts to break functionality of the primary artifact (which is really just to say "perform QA").

As a simple use-case, I've found LLMs to be much better than me at macro programming, and I don't really need to care about what it does because ultimately the constraint is just that it bends the syntax I have into the syntax I want, and things compile. The details are basically irrelevant.

▲

surajrmal 2 hours ago | parent | next [-]

Code quality will impact the effectiveness of ai. Less code to read and change in subsequent changes is still useful. There was a while where I became more of a paper architect and stopped coding for a while and I realized I wasn't able to do sufficient code reviews anymore because I lacked context. I went back into the code at some point and realized the mess my team was making and spent a long while cleaning it up. This improved the productivity of everyone involved. I expect AI to fall into a similar predicament. Without first hand knowledge of the implementation details we won't know about the problems we need to tell the AI to address. There are also many systems which are constrained in terms of memory and compute and more code likely puts you up against those limits.

	▲	ndriscoll an hour ago \| parent [-]
		I don't disagree that code quality is currently more important than it's ever been (to get the most out of the tools). I expect that quality will increase though as people refine either training or instructions. I was able to get much better (well factored, aligned to business logic) output that I'm generally happy-ish with a couple months ago with some coding guidelines I wrote. It's possible that newer models don't even need that, but they work well enough with it that I haven't touched those instructions since.

▲

rafterydj 2 hours ago | parent | prev [-]

I mean, sure, for programming macros. Or programming quick scripts, or type-safe or memory-safe programs. Or web frontends, or a11y, or whatever tasks for which people are using AI.

But if you peel back that layer to the point where you are no longer discussing the code, and just saying "code X that does Y"... how big is X going to get without verifying it? This is a basic, fundamental question that gets deflected by evaluating each case where AI is useful.

When you stop being specific about what the AI is doing, and switch to the general tense, there is a massive and obvious gap that nobody is adequately addressing. I don't think anyone would say that details are irrelevant in the case of life-threatening scenarios, and yet no one is acknowledging where the logical end to this line of thinking goes.

▲

falkensmaize 3 hours ago | parent | prev [-]

They will still be turning out the same problematic code in a few years that they do now, because they aren’t intelligent and won’t be intelligent unless there is a fundamental paradigm shift in how an LLM works.

I use LLMs with best practices to program professionally in an enterprise every day, and even Opus 4.6 still consistently makes some of the dumbest architectural decisions, even with full context, complete access to the codebase and me asking very specific questions that should point it in the right direction.

	▲	stevepotter 2 hours ago \| parent [-]
		I keep hearing “they aren’t intelligent” and spit out “crap code”. That’s not been my experience. LLMs prevented and also caught intricate concurrency issues that would have taken me a long time. I just went “hmmm, nice” and went on. The problem there is that I didn’t get that sense of accomplishment I crave and I really didn’t learn anything. Those are “me” problems but I think programmers are collectively grappling with this.