Remix clone Hacker News

new | show | ask | jobs Github

	▲	tripplyons 2 days ago
		Have you explored chunkwise parallel approaches? They use the O(n log n) parallel algorithm within a subsequence and update bewteen chunks recurrently like the O(n) recurrent algorithm. These are usually the fastest kernels for these kinds of RNNs. Here is the best explanation I have seen: https://sustcsonglin.github.io/blog/2024/deltanet-2/