Remix.run Logo
krackers a day ago

https://www.lesswrong.com/posts/gTZ2SxesbHckJ3CkF/transforme... explains this more, see the part about mixed state presentation & belief synchronization

>Another way to think about our claim is that transformers perform two types of inference: one to infer the structure of the data-generating process, and another meta-inference to update it's internal beliefs over which state the data-generating process is in, given some history of finite data (ie the context window). This second type of inference can be thought of as the algorithmic or computational structure of synchronizing to the hidden structure of the data-generating process.

tlarkworthy 6 hours ago | parent [-]

Oh yeah, I read that article and could not find it agin. Thank you.

It really open my mind to what is special about transformers.