Because of the scale of generated code, often it is the AI verifying the AI's work.

ptnpzwqd an hour ago | parent | next [-]

I of course cannot say what the future holds, but current frontier models are - in my experience - nowhere near good enough for such autonomy.

Even with other agents reviewing the code, good test coverage, etc., both smaller - and every now and then larger - mistakes make their way through, and the existence of such mistakes in the codebase tend to accellerate even more of them.

It for sure depends on many factors, but I have seen enough to feel confident that we are not there yet.

▲

tartoran 5 hours ago | parent | prev [-]

So who's verifying the AI doing the verifying or is it yet another AI layer doing that? If something goes wrong who's liable, the AI?

	▲	visarga 4 hours ago \| parent [-]
		You have 2 paths - code tests and AI review which is just vibe test of LGTM kind, should be using both in tandem, code testing is cheap to run and you can build more complex systems if you apply it well. But ultimately it is the user or usage that needs to direct testing, or pay the price for formal verification. Most of the time it is usage, time passing reveals failure modes, hindsight is 20/20.