Remix clone Hacker News

new | show | ask | jobs Github

	▲	kittikitti 2 days ago
		As someone who developed chatbots with HMM's and the Transformers algorithms, this is a great and succinct answer. The paper, Attention Is All You Need, solved this drawback.
	▲	vjerancrnjak 2 days ago \| parent [-]
		Markov Random Fields also do that. Difference is obviously there but nothing prevents you from undirected conditioning of long range dependencies. There’s no need to chain anything. The problem from a math standpoint is that it’s an intractable exercise. The moment you start relaxing the joint opt problem you’ll end up at a similar place.