Somehow they did use this as part of their approach to get to 0 regressions across 65k tests + no performance regressions though + identical output for AST and bytecode though. How much manual review was part of the hundreds of rounds of prompt steering is not stated, but I don't think it's possible to say it couldn't find any deep logical errors along the way and still achieve those results.

The part that concerns me is whether this part will actually come in time or not:

> The Rust code intentionally mimics things like the C++ register allocation patterns so that the two compilers produce identical bytecode. Correctness is a close second. We know the result isn’t idiomatic Rust, and there’s a lot that can be simplified once we’re comfortable retiring the C++ pipeline. That cleanup will come in time.

Of course, it wouldn't be the first time Andreas delivered more than I expected :).

▲

kneel25 2 hours ago | parent [-]

That’s convincing and impressive, but I wouldn’t say it proves it can spot deep errors. If it’s incredible at porting files and comparing against the source of truth then finding complicated issues isn’t being tested imo.

	▲	zamadatix 3 minutes ago \| parent [-]
		If completing the above successfully doesn't necessarily test these abilities then where does the concern about the capability to do so come into play?