>The people really leading AI coding right now (and I’d put myself near the front, though not all the way there) don’t read code. They manage the things that produce code.

I can’t imagine any other example where people voluntarily move for a black box approach.

Imagine taking a picture on autoshot mode and refusing to look at it. If the client doesn’t like it because it’s too bright, tweak the settings and shoot again, but never look at the output.

What is the logic here? Because if you can read code, I can’t imagine poking the result with black box testing being faster.

Are these people just handing off the review process to others? Are they unable to read code and hiding it? Why would you handicap yourself this way?

▲

eikenberry 3 hours ago | parent | next [-]

I think many people are missing the overall meaning of these sorts of posts.. that is they are describing a new type of programmer that will only use agents and never read the underlying code. These vibe/agent coders will use natural(-ish) language to communicate with the agents and wouldn't look at the code anymore than, say, a PHP developer would look at the underlying assembly. It is not the level of abstraction they are working on. There are many use cases where this type of coding will work fine and it will let many people who previously couldn't really take advantage of computers to do so. This is great but in no way will do anything to replace the need for code that requires humans to understand (which, in turn, requires participation in the writing).

▲

jkhdigital 3 hours ago | parent | next [-]

Your analogy to PHP developers not reading assembly got me thinking.

Early resistance to high-level (i.e. compiled) languages came from assembly programmers who couldn’t imagine that the compiler could generate code that was just as performant as their hand-crafted product. For a while they were right, but improved compiler design and the relentless performance increases in hardware made it so that even an extra 10-20% boost you might get from perfectly hand-crafted assembly was almost never worth the developer time.

There is an obvious parallel here, but it’s not quite the same. The high-level language is effectively a formal spec for the abstract machine which is faithfully translated by the (hopefully bug-free) compiler. Natural language is not a formal spec for anything, and LLM-based agents are not formally verifiable software. So the tradeoffs involved are not only about developer time vs. performance, but also correctness.

	▲	andai 9 minutes ago \| parent \| next [-]
		> So the tradeoffs involved are not only about developer time vs. performance, but also correctness. The "now that producing plausible code is free, verification becomes the bottleneck" people are technically right, of course, but I think they're missing the context that very few projects cared much about correctness to begin with. The biggest headache I can see right now is just the humans keeping track of all the new code, because it arrives faster than they can digest it. But I guess "let go of the need to even look at the code" "solves" that problem, for many projects... Strange times! For example -- someone correct me if I'm wrong -- OpenClaw was itself almost entirely written by AI, and the developer bragged about not reading the code. If anything, in this niche, that actually helped the project's success, rather than harming it. (In the case of Windows 11 recently.. not so much ;)
	▲	bandrami 24 minutes ago \| parent \| prev \| next [-]
		OK but, I've definitely read the assembly listings my C compiler produced when it wasn't working like I hoped. Even if that's not all that frequent it's something I expect I have to do from time to time and is definitely part of "programming".
	▲	drawnwren 43 minutes ago \| parent \| prev \| next [-]
		It's also important to remember that vibe coders throw away the natural language spec each time they close the context window. Vibe coding is closer to compiling your code, throwing the source away and asking a friend to give you source that is pretty close to the one you wrote.
	▲	ytoawwhra92 3 hours ago \| parent \| prev \| next [-]
		For a great many software projects no formal spec exists. The code is the spec, and it gets modified constantly based on user feedback and other requirements that often appear out of nowhere. For many projects, maybe ~80% of the thinking about how the software should work happens after some version of the software exists and is being used to do meaningful work. Put another way, if you don't know what correct is before you start working then no tradeoff exists.
	▲	HansHamster 2 hours ago \| parent \| prev \| next [-]
		> which is faithfully translated by the (hopefully bug-free) compiler. "Hey Claude, translate this piece of PHP code into Power10 assembly!"
	▲	QuadmasterXLII 2 hours ago \| parent \| prev [-]
		Imagine if high level coding worked like: write a first draft, and get assembly. All subsequent high level code is written in a repl and expresses changes to the assembly, or queries the state of the assembly, and is then discarded. only the assembly is checked into version control.

▲

straydusk 3 hours ago | parent | prev | next [-]

I'm glad you wrote this comment because I completely agree with it. I don't think that there is not a need for software engineers to deeply consider architecture; who can fully understand the truly critical systems that exist at most software companies; who can help dream up the harness capabilities to make these agents work better.

I just am describing what I'm doing now, and what I'm seeing at the leading edge of using these tools. It's a different approach - but I think it'll become the most common way of producing software.

▲

re-thc 3 hours ago | parent | prev [-]

> that is they are describing a new type of programmer that will only use agents and never read the underlying code

> and wouldn't look at the code anymore than, say, a PHP developer would look at the underlying assembly

This really puts down the work that the PHP maintainers have done. Many people spend a lot of time crafting the PHP codebase so you don't have to look at the underlying assembly. There is a certain amount of trust that I as a PHP developer assume.

Is this what the agents do? No. They scrape random bits of code everywhere and put something together with no craft. How do I know they won't hide exploits somewhere? How do I know they don't leak my credentials?

▲

csallen 4 hours ago | parent | prev | next [-]

> Imagine taking a picture on autoshot mode and refusing to look at it. If the client doesn’t like it because it’s too bright, tweak the settings and shoot again, but never look at the output.

The output of code isn't just the code itself, it's the product. The code is a means to an end.

So the proper analogy isn't the photographer not looking at the photos, it's the photographer not looking at what's going on under the hood to produce the photos. Which, of course, is perfectly common and normal.

▲

kace91 4 hours ago | parent | next [-]

>The output of code isn't just the code itself, it's the product. The code is a means to an end.

I’ll bite. Is this person manually testing everything that one would regularly unit test? Or writing black box tests that he does know are correct because of being manually written?

If not, you’re not reviewing the product either. If yes, it’s less time consuming to actually read and test the damn code

▲

CuriouslyC 4 hours ago | parent [-]

I mostly ignore code, I lean on specs + tests + static analysis. I spot check tests depending on how likely I think it is for the agent to have messed up or misinterpreted my instructions. I push very high test coverage on all my projects (85%+), and part of the way I build is "testing ladders" where I have the agent create progressively bigger integration tests, until I hit e2e/manual validation.

▲

kace91 4 hours ago | parent | next [-]

>I spot check tests depending on how likely I think it is for the agent to have messed up or misinterpreted my instructions

So a percentage of your code, based on your gut feeling, is left unseen by any human by the moment you submit it.

Do you agree that this rises the chance of bugs slipping by? I don’t see how you wouldn’t.

And considering the fact that your code output is larger, the percentage of it that is buggy is larger, and (presumably) you write faster, have you considered the conclusion in terms of the compounding likelihood of incidents?

	▲	CuriouslyC 2 hours ago \| parent [-]
		There's definitely a class of bugs that are a lot more common, where the code deviates from the intent in some subtle way, while still being functional. I deal with this using benchmarking and heavy dogfooding, both of these really expose errors/rough edges well.

▲

straydusk 4 hours ago | parent | prev [-]

"Testing ladders" is a great framing.

My approach is similar. I invest in the harness layer (tests, hooks, linting, pre-commit checks). The code review happens, it's just happening through tooling rather than my eyeballs.

▲

straydusk 4 hours ago | parent | prev | next [-]

Exactly this. The code is an intermediate artifact - what I actually care about is: does the product work, does it meet the spec, do the tests pass?

I've found that focusing my attention upstream (specs, constraints, test harness) yields better outcomes than poring over implementation details line by line. The code is still there if I need it. I just rarely need it.

▲

alanbernstein 4 hours ago | parent | prev | next [-]

Right, it seems the appropriate analogy is the shift from analog-photograph-developers to digital camera photographers.

▲

add-sub-mul-div 4 hours ago | parent | prev [-]

A photo isn't going to fail next week or three months from now because it's full of bugs no one's triggered yet.

Specious analogies don't help anything.

▲

andyferris 44 minutes ago | parent | prev | next [-]

The output is the program behavior. You use it, like a user, and give feedback to the coding agent.

If the app is too bright, you tweak the settings and build it again.

Photography used to involve developing film in dark rooms. Now my iPhone does... god knows what to the photo - I just tweak in post, or reshoot. I _could_ get the raw, understand the algorithm to transform that into sRGB, understand my compression settings, etc - but I don't need to.

Similarly, I think there will be people who create useful software without looking at what happens in between. And there will still be low-level software engineers for whom what happens in between is their job.

▲

CharlesW 4 hours ago | parent | prev | next [-]

AI-assisted coding is not a black box in the way that managing an engineering team of humans is. You see the model "thinking", you see diffs being created, and occasionally you intervene to keep things on track. If you're leveraging AI professionally, any coding has been preceded by planning (the breadth and depth of which scale with the task) and test suites.

▲

Aeolun 4 hours ago | parent | prev | next [-]

> What is the logic here?

It is right often enough that your time is better spent testing the functionality than the code.

Sometimes it’s not right, and you need to re-instruct (often) or dive in (not very often).

	▲	kace91 4 hours ago \| parent [-]
		I can’t imagine retesting all the functionality of a well established product for possible regressions not being stupidly time consuming. This is the very reason why we have unit tests in the first place, and why they are far more numerous in tests than end-to-end ones.

▲

weikju 4 hours ago | parent | prev | next [-]

Don’t read the code, test for desired behavior, miss out on all the hidden undesired behavior injected by malicious prompts or AI providers. Brave new world!

▲

thefz 4 hours ago | parent [-]

You made me imagine AI companies maliciously injecting backdoors in generated code no one reads, and now I'm scared.

	▲	gibsonsmog 4 hours ago \| parent \| next [-]
		My understanding is that it's quite easy to poison the models with inaccurate data, I wouldn't be surprised if this exact thing has happened already. Maybe not an AI company itself, but it's definitely in the purview of a hostile actor to create bad code for this purpose. I suppose it's kind of already happened via supply chain attacks using AI generated package names that didn't exist prior to the LLM generating them.
	▲	djeastm an hour ago \| parent \| prev \| next [-]
		One mitigation might be to use one company's model to check the work of another company's code and depend on market competition to keep the checks and balances.
	▲	bandrami 26 minutes ago \| parent \| prev [-]
		Already happening in the wild

▲

manmal 4 hours ago | parent | prev | next [-]

> I can’t imagine any other example where people voluntarily move for a black box approach.

Anyone overseeing work from multiple people has to? At some point you have to let go and trust people‘s judgement, or, well, let them go. Reading and understanding the whole output of 9 concurrently running agents is impossible. People who do that (I‘m not one of them btw) must rely on higher level reports. Maybe drilling into this or that piece of code occasionally.

▲

kace91 4 hours ago | parent | next [-]

>At some point you have to let go and trust people‘s judgement.

Indeed. People. With salaries, general intelligence, a stake in the matter and a negative outcome if they don’t take responsibility.

>Reading and understanding the whole output of 9 concurrently running agents is impossible.

I agree. It is also impossible for a person to drive two cars at once… so we don’t. Why is the starting point of the conversation that one should be able to use 9 concurring agents?

I get it, writing code no longer has a physical bottleneck. So the bottleneck becomes the next thing, which is our ability to review outputs. It’s already a giant advancement, why are we ignoring that second bottleneck and dropping quality assurance as well? Eventually someone has to put their signature on the thing being shippable.

	▲	wtetzner 22 minutes ago \| parent \| next [-]
		Is reviewing outputs really more efficient than writing the code? Especially if it's a code base you haven't written code in?
	▲	an hour ago \| parent \| prev [-]
		[deleted]

▲

ink_13 4 hours ago | parent | prev | next [-]

An AI agent cannot be held accountable

	▲	manmal 4 hours ago \| parent [-]
		Neither can employees, in many countries.

▲

re-thc 4 hours ago | parent | prev [-]

> Anyone overseeing work from multiple people has to?

That's not a black box though. Someone is still reading the code.

> At some point you have to let go and trust people‘s judgement

Where's the people in this case?

> People who do that (I‘m not one of them btw) must rely on higher level reports.

Does such a thing exist here? Just "done".

	▲	manmal 4 hours ago \| parent [-]
		> Someone is still reading the code. But you are not. That’s the point? > Where's the people in this case? Juniors build worse code than codex. Their superiors also can‘t check everything they do. They need to have some level of trust for doing dumb shit, or they can’t hire juniors. > Does such a thing exist here? Just "done". Not sure what you mean. You can definitely ask the agent what it built, why it built it, and what could be improved. You will get only part of the info vs when you read the output, but it won’t be zero info.

▲

Xirdus 4 hours ago | parent | prev | next [-]

> I can’t imagine any other example where people voluntarily move for a black box approach.

I can think of a few. The last 78 pages of any 80-page business analysis report. The music tracks of those "12 hours of chill jazz music" YouTube videos. Political speeches written ahead of time. Basically - anywhere that a proper review is more work than the task itself, and the quality of output doesn't matter much.

▲

ink_13 4 hours ago | parent [-]

So... things where the producer doesn't respect the audience? Because any such analysis would be worth as much as a 4.5 hour atonal bass solo.

	▲	sroerick 2 hours ago \| parent [-]
		You can get an AI to listen to that bass solo for you

▲

straydusk 4 hours ago | parent | prev | next [-]

No pun intended but - it's been more "vibes" than science that I've done this. It's more effective. When I focus my attention on the harness layer (tests, hooks, checks, etc), and the inputs, my overall velocity improves relative to reading & debugging the code directly.

To be fair - it is not accurate to say I absolutely never read the code. It's just rare, and it's much more the exception than the rule.

My workflow just focuses much more on the final product, and the initial input layer, not the code - it's becoming less consequential.

▲

bloomca 2 hours ago | parent | prev | next [-]

I think this is the logical next step -- instead of manually steering the model, just rely on the acceptance criteria and some E2E test suite (that part is tricky since you need to verify that part).

I personally think we are not that far from it, but it will need something built on top of current CLI tools.

▲

raincole 4 hours ago | parent | prev | next [-]

> Because if you can read code, I can’t imagine poking the result with black box testing being faster.

I don't know... it depends on the use case. I can't imagine even the best front-end engineer ever can read HTML faster than looking at the rendered webpage to check if the layout is correct.

▲

AlexCoventry 3 hours ago | parent | prev | next [-]

> What is the logic here? Because if you can read code, I can’t imagine poking the result with black box testing being faster.

It's producing seemingly working code faster than you can closely review it.

	▲	kace91 3 hours ago \| parent [-]
		Your car can also move faster than what you can safely control. Knowing this, why go pedal to the metal?

▲

seanmcdirmid 4 hours ago | parent | prev | next [-]

> What is the logic here? Because if you can read code, I can’t imagine poking the result with black box testing being faster.

The AI also writes the black box tests, what am I missing here?

▲

kace91 3 hours ago | parent [-]

>The AI also writes the black box tests, what am I missing here?

If the AI misinterpreted your intentions and/or missed something in productive code, tests are likely to reproduce rather than catch that behavior.

In other words, if “the ai is checking as well” no one is.

	▲	seanmcdirmid 3 hours ago \| parent [-]
		That's true. Never let the AI know about the code it wrote when writing the test for sure. Write multiple tests, have an arbitrator (also AI) figure out if implementation or tests are wrong when tests fail. Have the AI heavily comment code and heavily comment tests in the language of your spec so you can manually verify if the scenarios/parts of the implementations make sense when it matters. etc...etc... > In other words, if “the ai is checking as well” no one is. "I tried nothing, and nothing at all worked!"

▲

hjoutfbkfd 4 hours ago | parent | prev | next [-]

your metaphor is wrong.

code is not the output. functionality is the output, and you do look at that.

	▲	kace91 3 hours ago \| parent [-]
		Explain then how testing the functionality (not the new one; regressions included, this is not a school exercise) is faster than checking the code. Are you writing black box testing by hand, or manually checking, everything that would normally be a unit test? We have unit tests precisely because of how unworkable the “every test is black box” approach is.

▲

ForHackernews 4 hours ago | parent | prev | next [-]

>Imagine taking a picture on autoshot mode

Almost everyone does this. Hardly anyone taking pictures understands what f-stop or focal length are. Even those who do seldom adjust them.

There dozens of other examples where people voluntarily move to a black box approach. How many Americans drive a car with a manual transmission?

	▲	weikju 4 hours ago \| parent [-]
		You missed out on the rest of the analogy though, which is the part where the photo is not reviewed before handing it over to the client.

▲

notepad0x90 4 hours ago | parent | prev [-]

people care about results. Better processes need to produce better results. this is programming not a belief system where you have to adhere to some view or else.