Remix clone Hacker News

new | show | ask | jobs Github

	▲	NitpickLawyer 8 hours ago
		Yes, it's described in this section - https://huggingface.co/Qwen/Qwen3.5-397B-A17B#processing-ult... Yarn, but with some caveats: current implementations might reduce performance on short ctx, only use yarn for long tasks. Interesting that they're serving both on openrouter, and the -plus is a bit cheaper for <256k ctx. So they must have more inference goodies packed in there (proprietary). We'll see where the 3rd party inference providers will settle wrt cost.
	▲	ggcr 7 hours ago \| parent [-]
		Thanks, I've totally missed that It's basically the same as with the Qwen2.5 and 3 series but this time with 1M context and 200k native, yay :)