I don't expect a Markov chain to be capable of backtracking. That's the point I am making. Logical reasoning as it is implemented in Prolog interpreters is not something that can be done w/ LLMs regardless of the size of their weights, biases, & activation functions between the nodes in the graph.

▲ bondarchuk 6 days ago | parent | next [-]

Imagine the context window contains A-B-C, C turns out a dead end and we want to backtrack to B and try another branch. Then the LLM could produce outputs such that the context window would become A-B-C-[backtrack-back-to-B-and-don't-do-C] which after some more tokens could become A-B-C-[backtrack-back-to-B-and-don't-do-C]-D. This would essentially be backtracking and I don't see why it would be inherently impossible for LLMs as long as the different branches fit in context.

▲

measurablefunc 6 days ago | parent [-]

If you think it is possible then I'd like to see an implementation of a sudoku puzzle solver as Markov chain. This is a simple enough problem that can be implemented in a few dozen lines of Prolog but I've never seen a solver implemented as a Markov chain.

▲

Ukv 6 days ago | parent | next [-]

> If you think it is possible then I'd like to see an implementation of a sudoku puzzle solver as Markov chain

Have each of the Markov chain's states be one of 10^81 possible sudoku grids (a 9x9 grid of digits 1-9 and blank), then calculate the 10^81-by-10^81 transition matrix that takes each incomplete grid to the valid complete grid containing the same numbers. If you want you could even have it fill one square at a time rather than jump right to the solution, though there's no need to.

Up to you what you do for ambiguous inputs (select one solution at random to give 1.0 probability in the transition matrix? equally weight valid solutions? have the states be sets of boards and map to set of all valid solutions?) and impossible inputs (map to itself? have the states be sets of boards and map to empty set?).

Could say that's "cheating" by pre-computing the answers and hard-coding them in a massive input-output lookup table, but to my understanding that's also the only sense in which there's equivalence between Markov chains and LLMs.

▲

measurablefunc 6 days ago | parent [-]

There are multiple solutions for each incomplete grid so how are you calculating the transitions for a grid w/ a non-unique solution?

Edit: I see you added questions for the ambiguities but modulo those choices your solution will almost work b/c it is not extensionally equivalent entirely. The transition graph and solver are almost extensionally equivalent but whereas the Prolog solver will backtrack there is no backtracking in the Markov chain and you have to re-run the chain multiple times to find all the solutions.

▲

Ukv 6 days ago | parent | next [-]

> but whereas the Prolog solver will backtrack there is no backtracking in the Markov chain and you have to re-run the chain multiple times to find all the solutions

If you want it to give all possible solutions at once, you can just expand the state space to the power-set of sudoku boards, such that the input board transitions to the state representing the set of valid solved boards.

▲

measurablefunc 6 days ago | parent | next [-]

That still won't work b/c there is no backtracking. The point is that there is no way to encode backtracking/choice points like in Prolog w/ a Markov chain. The argument you have presented is not extensionally equivalent to the Prolog solver. It is almost equivalent but it's missing choice points for starting at a valid solution & backtracking to an incomplete board to generate a new one. The typical argument for absorbing states doesn't work b/c sudoku is not a typical deterministic puzzle.

▲

Ukv 6 days ago | parent [-]

> That still won't work b/c there is no backtracking.

It's essentially just a lookup table mapping from input board to the set of valid output boards - there's no real way for it not to work (obviously not practical though). If board A has valid solutions B, C, D, then the transition matrix cell mapping {A} to {B, C, D} is 1.0, and all other entries in that row are 0.0.

> The point is that there is no way to encode backtracking/choice points

You can if you want, keeping the same variables as a regular sudoku solver as part of the Markov chain's state and transitioning instruction-by-instruction, rather than mapping directly to the solution - just that there's no particular need to when you've precomputed the solution.

▲

measurablefunc 6 days ago | parent [-]

My point is that your initial argument was missing several key pieces & if you specify the entire state space you will see that it's not as simple as you thought initially. I'm not saying it can't be done but that it's actually much more complicated than simply saying just take an incomplete board state s & uniform transitions between s, s' for valid solutions s' that are compatible with s. In fact, now that I spelled out the issues I still don't think this is a formal extensional equivalence. Prolog has interactive transitions between the states & it tracks choice points so compiling a sudoku solver to a Markov chain requires more than just tracking the board state in the context.

▲

Ukv 6 days ago | parent [-]

> My point is that your initial argument was missing several key pieces

My initial example was a response to "If you think it is possible then I'd like to see an implementation of a sudoku puzzle solver as Markov chain", describing how a Sudoku solver could be implemented as a Markov chain. I don't think there's anything missing from it - it solves all proper Sudokus, and I only left open the choice of how to handle improper Sudokus because that was unspecified (but trivial regardless of what's wanted).

> I'm not saying it can't be done but that it's actually much more complicated

If that's the case, then I did misinterpret your comments as saying it can't be done. But, I don't think it's really complicated regardless of whatever "ok but now it must encode choice points in its state" are thrown at it - it's just a state-to-state transition look-up table.

> so compiling a sudoku solver to a Markov chain requires more than just tracking the board state in the context.

As noted, you can keep all the same variables as a regular Sudoku solver as part of the Markov chain's state and transition instruction-by-instruction, if that's what you want.

If you mean inputs from a user, the same is true of LLMs which are typically ran interactively. Either model the whole universe including the user as part of state transition table (maybe impossible, depending on your beliefs about the universe), or have user interaction take the current state, modify it, and use it as initial state for a new run of the Markov chain.

▲

measurablefunc 6 days ago | parent [-]

> As noted, you can keep all the same variables as a regular Sudoku solver

What are those variables exactly?

	▲	Ukv 6 days ago \| parent [-]
		For a depth-first solution (backtracking), I'd assume mostly just the partial solutions and a few small counters/indices/masks - like for tracking the cell we're up to and which cells were prefilled. Specifics will depend on the solver, but can be made part of Markov chain's state regardless.

▲

Certhas 5 days ago | parent | prev [-]

People really don't appreciate what is possible in infinite (or more precisely: arbitrarily high) dimensional spaces.

▲

6 days ago | parent | prev [-]

[deleted]

▲

bboygravity 6 days ago | parent | prev | next [-]

The LLM can just write the Prolog and solve the sudoku that way. I don't get your point. LLMs like Grok 4 can probably one-shot this today with the current state of art. You can likely just ask it to solve any sudoku and it will do it (by writing code in the background and running it and returning the result). And this is still very early stage compared to what will be out a year from now.

Why does it matter how it does it or whether this is strictly LLM or LLM with tools for any practical purpose?

▲

PhunkyPhil 6 days ago | parent [-]

The point isn't if the output is correct or not, it's if the actual net is doing "logical computation" ala Prolog.

What you're suggesting is akin to me saying you can't build a house, then you go and hire someone to build a house. _You_ didn't build the house.

	▲	kaibee 6 days ago \| parent [-]
		I feel like you're kinda proving too much. By the same reasoning, humans/programmers aren't generally intelligent either, because we can only mentally simulate relatively small state spaces of programs, and when my boss tells me to go build a tool, I'm not exactly writing raw x86 assembly. I didn't _build_ the tool, I just wrote text that instructed a compiler how to build the tool. Like the whole reason we invented SAT solvers is because we're not smart in that way. But I feel like you're trying to argue that LLMs at any scale gonna be less capable than an average person?

▲

lelanthran 6 days ago | parent | prev | next [-]

> If you think it is possible then I'd like to see an implementation of a sudoku puzzle solver as Markov chain. This is a simple enough problem that can be implemented in a few dozen lines of Prolog but I've never seen a solver implemented as a Markov chain.

I think it can be done. I started a chatbot that works like this some time back (2024) but paused work on it since January.

In brief, you shorten the context by discarding the context that didn't work out.

▲

sudosysgen 6 days ago | parent | prev [-]

You can do that pretty trivially for any fixed size problem (as in solvable with a fixed-sized tape Turing machine), you'll just have a titanically huge state space. The claim of the LLM folks is that the models have a huge state space (they do have a titanically huge state space) and can navigate it efficiently.

Simply have a deterministic Markov chain where each state is a possible value of the tape+state of the TM and which transitions accordingly.

	▲	measurablefunc 6 days ago \| parent [-]
		How are you encoding the state spaces for the sudoku solver specifically?

▲ Certhas 6 days ago | parent | prev | next [-]

Take a finite tape Turing machine with N states and tape length T and N^T total possible tape states.

Now consider that you have a probability for each state instead of a definite state. The transitions of the Turing machine induce transitions of the probabilities. These transitions define a Markov chain on a N^T dimensional probability space.

Is this useful? Absolutely not. It's just a trivial rewriting. But it shows that high dimensional spaces are extremely powerful. You can trade off sophisticated transition rules for high dimensionality.

▲ vidarh 6 days ago | parent | prev | next [-]

A (2,3) Turing machine can be trivially implemented with a loop around an LLM that treats the context as an IO channel, and a Prolog interpreter runs on a Turing complete computer, and so per Truing equivalence you can run a Prolog interpreter on an LLM.

Of course this would be pointless, but it demonstrates that a system where an LLM provides the logic can backtrack, as there's nothing computationally special about backtracking.

That current UIs to LLMs are set up for conversation-style use that makes this harder isn't an inherent limitation of what we can do with LLMs.

▲ measurablefunc 6 days ago | parent [-]

Loop around an LLM is not an LLM.

▲ vidarh 6 days ago | parent [-]

Then no current systems you are using are LLMs

▲ measurablefunc 6 days ago | parent [-]

Choice-free feedforward graphs are LLMs. The inputs/outputs are extensionally equivalent to context and transition probabilities of a Markov chain. What exactly is your argument b/c what it looks like to me is you're simply making a Turing tarpit argument which does not address any of my points.

▲ vidarh 6 days ago | parent [-]

My argument is that artificially limiting what you argue about to a subset of the systems people are actually using and arguing about the limitations of that makes your argument irrelevant to what people are actually using.

▲ measurablefunc 4 days ago | parent [-]

So where is the error exactly? Loop around is simply a repetition of the argument for the equivalence between an LLM & a Markov chain. It doesn't matter how many times you sample the trajectories from either one, they're still extensionally equivalent.

▲ vidarh 2 days ago | parent [-]

Since an LLM with a loop is trivially and demonstrably Turing complete if you allow it to use the context as an IO channel (and thereby memory), by extension arguing there's some limitation that prevents an LLM from doing what Prolog can is logically invalid.

In other words, this claim is categorically false:

> Logical reasoning as it is implemented in Prolog interpreters is not something that can be done w/ LLMs regardless of the size of their weights, biases, & activation functions between the nodes in the graph.

What is limiting "just" an LLM is not the ability of the model to encode reasoning, but the lack of a minimal and trivial runtime scaffolding to let it use it's capabilities.

▲ measurablefunc 2 days ago | parent [-]

> Since an LLM with a loop is trivially and demonstrably Turing complete

Where is the demonstration?

▲ vidarh 2 days ago | parent [-]

In every LLM app you have available that can set temperature.

You can try a template like this:

    Turing machine transition table:
    q₀, 0 → q₀, 0, R
    q₀, 1 → q₁, 1, R  
    q₀, B → q₀, B, L
    q₁, 0 → q₁, 0, R
    q₁, 1 → q₀, 1, R
    q₁, B → q₁, B, L

    Current state: [STATE]
    Current symbol: [SYMBOL]
    
    What is the next transition?
    Format: NEW_STATE|WRITE_SYMBOL|DIRECTION

You might have to tweak it depending on the given model you seek, but given you set temperature to 0, once you've tweaked it so it works on that model for all 6 combinations of state and symbol, it will continue working.

Repeat it, and provide a loop with the IO as indicated by the output.

If you struggle with the notion that an LLM can handle a lookup table from 6 combinations of states and symbols to 6 tuples, you're looking for excuses to disagree and/or don't understand how simple a UTM is.

The above is a sufficient definition for someone who understands a UTM to figure out how to execute the machine step by step.

	▲	measurablefunc a day ago \| parent [-]
		Let me know when you manage to implement the Fibonacci sequence w/ this. It should be doable since you seem to think you already have a Turing machine implemented on an LLM w/ a loop around it.

▲ 6 days ago | parent | prev [-]

[deleted]