Weirdly, you can not only do this, it somehow does actually catch some of its own mistakes.

Not all of the mistakes, they generally still have a performance ceiling less than human experts (though even this disclaimer is still simplifying), but this kind of self-critique is basically what makes the early "reasoning" models one up over simple chat models: for the first-n :END: tokens, replace with "wait" and see it attempt other solutions and pick something usually better.

▲

vrighter 4 days ago | parent [-]

the "pick something usually better" sounds a lot like "and then draw the rest of the f*** owl"

	▲	ben_w 4 days ago \| parent [-]
		Turned out that for a lot of things (not all things, Transformers have a lot of weaknesses), using a neural network to score an output is, if not "fine", then at least "ok". Generating 10 options with mediocre mean and some standard deviation, and then evaluating which is best, is much easier than deliberative reasoning to just get one thing right in the first place more often.