The more verifiable the domain the better suited. We see similar reports of benefits from advanced mathematics research from Terrence Tao, granted some reports seem to amount to very few knew some data existed that was relevant to the proof, but the LLM had it in its training corpus. Still, verifiably correct domains are well-suited.

So the concept formal verification is as relevant as ever, and when building interconnected programs the complexity rises and verifiability becomes more difficult.

▲

root_axis 10 hours ago | parent | next [-]

> The more verifiable the domain the better suited.

Absolutely. It's also worth noting that in the case of Tao's work, the LLM was producing Lean and Python code.

▲

2001zhaozhao 10 hours ago | parent | prev [-]

I think the solution in harder-to-verify cases is to provide AI (sub-)agents a really good set of instructions on a detailed set of guidelines of what it should do and in what ways it should think and explore and break down problems. Potentially tens of thousands of words of instructions to get the LLM to act as a competent employee in the field. Then the models need to be good enough at instruction-following to actually explore the problem in the right way and apply basic intelligence to solve it. Basically treating the LLM as a competent general knowledge worker that is unfamiliar with the specific field, and giving it detailed instructions on how to succeed in this field.

For the easy-to-verify fields like coding, you can train "domain intuitions" directly to the LLM (and some of this training should generalize to other knowledge work abilities), but for other fields you would need to supply them in the prompt as the abilities cannot be trained into the LLM directly. This will need better models but might become doable in a few generations.

	▲	root_axis 6 hours ago \| parent [-]
		> I think the solution in harder-to-verify cases is to provide AI (sub-)agents a really good set of instructions on a detailed set of guidelines of what it should do and in what ways it should think and explore and break down problems Using LLMs to validate LLMs isn't a solution to this problem. If the system can't self-verify then there's no signal to tell the LLM that it's wrong. The LLM is fundamentally unreliable, that's why you need a self-verifying system to guide and constrain the token generation.