LLMs, at least as they exist today, could be trained to be very good at producing text that looks like formal specifications, but there would be no way to guarantee that what they produced was a) correctly formed, or b) correctly specifying what was actually asked of them.

You might be able to get (a) better by requiring the LLM to feed its output through a solver and forcing it to redo anything that fails (this is where my knowledge of current LLMs kinda falls down), but (b) is still a fundamentally hard problem.

▲

wakawaka28 2 days ago | parent | next [-]

I don't see why two LLMs together (or one, alternating between tasks) could not separately develop a spec and an implementation. The human input could be a set of abstract requirements, and both systems interact and cross-check each other to meet the abstract requirements, perhaps with some code/spec reviews by humans. I really don't see it ever working without one or more humans in the loop, if only to confirm that what is being done is actually what the human(s) intended. The humans would ideally be able to say as little as possible to get what they want. Unless/until we get powerful AGI, we will need to have technical human reviewers.

	▲	danaris 2 days ago \| parent [-]
		> I really don't see it ever working without one or more humans in the loop, if only to confirm that what is being done is actually what the human(s) intended. That is precisely my point.

▲

dontlaugh 3 days ago | parent | prev [-]

Some already use probabilistic methods to automatically produce proofs for specifications and code they wrote manually. It’s one of the few actually useful potential use cases for LLMs.