Remix clone Hacker News

new | show | ask | jobs Github

	▲	in-silico 2 days ago
		I wonder how different their method actually is from other sub-quadratic sparse attention methods like Reformer [1] and Routing Transformer [2]. [1]: https://arxiv.org/abs/2001.04451 [2]: https://arxiv.org/abs/2003.05997