Remix clone Hacker News

new | show | ask | jobs Github

	▲	Palmik 4 hours ago
		It's a fair question, even if it might be coming from a place of misunderstanding. For example, DeepSeek 3.2, which employs sparse attention [1], is not only faster with long context than normal 3.1, but also seems to be better (perhaps thanks to reducing the noise?). [1] It uses still quadratic router, but it's small, so it scales well in practice. https://api-docs.deepseek.com/news/news250929