Remix.run Logo
kittikitti 2 days ago

As someone who developed chatbots with HMM's and the Transformers algorithms, this is a great and succinct answer. The paper, Attention Is All You Need, solved this drawback.

vjerancrnjak 2 days ago | parent [-]

Markov Random Fields also do that.

Difference is obviously there but nothing prevents you from undirected conditioning of long range dependencies. There’s no need to chain anything.

The problem from a math standpoint is that it’s an intractable exercise. The moment you start relaxing the joint opt problem you’ll end up at a similar place.