Remix clone Hacker News

new | show | ask | jobs Github

	▲	cubefox 3 days ago
		They do look pretty good compared to the two other linear (non-Transformer) models. Conventional attention is hard to beat in benchmarks but it is quadratic in time and memory complexity.