> but none of the LLMs (open source or not) are capable of backtracking even though there is plenty of room for a basic Prolog interpreter. This seems like a very obvious shortcoming to me that no amount of smooth approximation can overcome.

The fundamental autoregressive architecture is absolutely capable of backtracking… we generate next token probabilities, select a next token, then calculate probabilities for the token thereafter.

There is absolutely nothing stopping you from “rewinding” to an earlier token, making a different selection and replaying from that point. The basic architecture absolutely supports it.

Why then has nobody implemented it? Maybe, this kind of backtracking isn’t really that useful.

▲

versteegen 6 days ago | parent | next [-]

Yes, but anyway, LLMs themselves are perfectly capable of backtracking reasoning while sampling is run forwards only, in the same way humans do: by deciding something doesn't work and trying something else. Humans DON'T time travel backwards in time and never have the erroneous thought in the first place.

▲

measurablefunc 6 days ago | parent | prev [-]

Where is this spelled out formally and proven logically?

▲

skissane 6 days ago | parent [-]

LLM backtracking is an active area of research, see e.g.

https://arxiv.org/html/2502.04404v1

https://arxiv.org/abs/2306.05426

And I was wrong that nobody has implemented it, as these papers prove people have… it is just the results haven’t been sufficiently impressive to support the transition from the research lab to industrial use - or at least, not yet

▲

measurablefunc 6 days ago | parent | next [-]

> Empirical evaluations demonstrate that our proposal significantly enhances the reasoning capabilities of LLMs, achieving a performance gain of over 40% compared to the optimal-path supervised fine-tuning method.

▲

afiori 6 days ago | parent | prev [-]

I would expect to see something like this soonish as around now we are seeing the end of training scaling and the beginning of inference scaling

	▲	foota 6 days ago \| parent [-]
		This is a neat observation, training has been optimized to hell and inference is just beginning.