Remix.run Logo
versteegen 5 days ago

Specifically, an order n Markov chain such as a transformer, if not otherwise restricted, can have any joint distribution you wish for the first n-1 steps: any extensional property. In which case you have to look at intensional properties to actually draw non-vacuous conclusions.

I would like to comment that there are a lot of papers out there on what transformers can or can't do that are misleading, often misunderstood, or abstract so far from transformers as implemented and used that they are pure theory.