Remix.run Logo
godelski 5 days ago

Calf isn't making an appeal to authority. They are saying "I'm not the idiot you think I am." Two very different things. Likely also a request to talk more mathy to them.

I read your link btw and I just don't know how someone can do all that work and not establish the Markov Property. That's like the first step. Speaking of which, I'm not sure I even understand the first definition of your link. I've never heard the phrase "computably countable" before, but I have head "computable number," which these numbers are countable. This does seem to be what it is referring to? So I'll assume that? (My dissertation wasn't on models of computation, it was on neural architectures) In 1.2.2 is there a reason for strictly uniform noise? It also seems to run counter to the deterministic setting.

Regardless, I agree with Calf, it's very clear MCs are not equivalent to LLMs. That is trivially a false statement. But the question of if an LLM can be represented via a MC is a different question. I did find this paper on the topic[0], but I need to give it a better read. Does look like it was rejected from ICLR[1], though ML review is very noisy. Including the link as comments are more informative than the accept/reject signal.

(@Calf, sorry, I didn't respond to your comment because I wasn't trying to make a comment about the relationship of LLMs and MCs. Only that there was more fundamental research being overshadowed)

[0] https://arxiv.org/abs/2410.02724

[1] https://openreview.net/forum?id=RDFkGZ9Dkh

measurablefunc 4 days ago | parent [-]

If it's trivially false then you should be able to present a counter-example but so far no one has done that but there has been a lot of hand-waving about "trivialities" of one sort or another.

Neural networks are stateless, the output only depends on the current input so the Markov property is trivially/vacuously true. The reason for the uniform random number for sampling from the CDF¹ is b/c if you have the cumulative distribution function of a probability density then you can sample from the distribution by using a uniformly distributed RNG.

¹https://stackoverflow.com/questions/60559616/how-to-sample-f...

godelski 4 days ago | parent [-]

You want me to show that it is trivially false that all Neural Networks are not Markov Chains? I mean we could point to a RNN which doesn't have the Markov Property. I mean another trivial case is when the rows do not sum to 1. I mean the internal states of neural networks are not required to be probability distributions. In fact, this isn't a requirement anywhere in a neural network. So whatever you want to call the transition matrix you're going to have issues.

Or the inverse of this? That all Markov Chains are Neural Networks? Sure. Well sure, here's my transition matrix [1].

I'm quite positive an LLM would be able to give you more examples.

  > the output only depends on the current input so the Markov property is trivially/vacuously true.
It's pretty clear you did not get your PhD in ML.

  > The reason for the uniform random number 
I think you're misunderstanding. Maybe I'm misunderstanding. But I'm failing to understand why you're jumping to the CDF. I also don't understand why this answers my question since there are other ways to sample from a distribution knowing only its CDF and without using the uniform distribution. I mean you can always convert to the uniform distribution and there's lots of tricks to do that. Or I mean the distribution in that SO post is the Rayleigh Distribution so we don't even need to do that. My question was not about that uniform is clean, but that it is a requirement. But this just doesn't seem relevant at all.
measurablefunc 3 days ago | parent [-]

Either find the exact error in the proof or stop running around in circles. The proof is very simple so if there is an error in any of it you should be able to find one very easily but you haven't done that. You have only asked for unrelated clarifications & gone on unrelated tangents.

godelski 2 days ago | parent [-]

  > Either find the exact error in the proof
I think I did

  > You have only asked for unrelated clarifications & gone on unrelated tangents.
I see the problem...
measurablefunc 2 days ago | parent [-]

> I see the problem

That's great, so you should be able to spell out the error & why it is an error. Go ahead.