| ▲ | kittikitti 2 days ago | |
As someone who developed chatbots with HMM's and the Transformers algorithms, this is a great and succinct answer. The paper, Attention Is All You Need, solved this drawback. | ||
| ▲ | vjerancrnjak 2 days ago | parent [-] | |
Markov Random Fields also do that. Difference is obviously there but nothing prevents you from undirected conditioning of long range dependencies. There’s no need to chain anything. The problem from a math standpoint is that it’s an intractable exercise. The moment you start relaxing the joint opt problem you’ll end up at a similar place. | ||