Your argument is just as applicable on human code reviewers. Obviously having others review the code will catch issues you would never have thought of. This includes agents as well.

▲

kneel25 8 hours ago | parent | next [-]

They’re not equal. Humans are capable of actually understanding and looking ahead at consequences of decisions made, whereas an LLM can’t. One is a review, one is mimicking the result of a hypothetical review without any of the actual reasoning. (And prompting itself in a loop is not real reasoning)

▲

iamleppert 5 hours ago | parent [-]

I keep hearing people say "but as humans we actually understand". What evidence do you have of the material differences in what understanding an LLM has, and what version a human has? What processes do we fundamentally do, that an LLM does not or cannot do? What here is the definition of "understanding", that, presumably an LLM does not currently do, that humans do?

	▲	kneel25 2 hours ago \| parent \| next [-]
		Well a material difference is we don’t input/output in tokens I guess. We have a concept of gaps and limits to knowledge, we have factors like ego, preservation, ambition that go into our thoughts where LLM just has raw data. Understanding the implication of a code change is having an idea of a desired structure, some idea of where you want to head to and how that meshes together. LLM has zero of any of that. Just because it can copy the output of the result of those factors I mention doesn’t mean they operate the same.
	▲	mcpar-land 4 hours ago \| parent \| prev [-]
		https://ml-site.cdn-apple.com/papers/the-illusion-of-thinkin...

▲

Fervicus 9 hours ago | parent | prev | next [-]

With humans though, I wouldn't have to review 20k lines of code at once.

▲

glhaynes 8 hours ago | parent [-]

So ask the AI to just translate one little chunk at a time, right?

	▲	Fervicus 7 hours ago \| parent [-]
		That's not what happened here though.

▲

DetroitThrow 9 hours ago | parent | prev [-]

>Your argument is just as applicable on human code reviewers.

The tests many of us use for how capable a model or harness is is usually based around whether they can spot logical errors readily visible to humans.

Hence: https://news.ycombinator.com/item?id=47031580