Remix clone Hacker News

	▲	kmeisthax 6 days ago
		Is there any evidence that GPT-4.1 is using RoPE to scale context? Also, I don't know about Qwen, but I know Llama 4 has severe performance issues, so I wouldn't use that as an example.
	▲	omneity 6 days ago \| parent [-]
		I am not sure about public evidence. But the memory requirements alone to train on 1M long windows would make it a very unrealistic proposition compared to RoPE scaling. And as I mentioned RoPE is essential for long context anyway. You can't train it in the "normal way". Please see the paper I linked previously for more context (pun not intended) on RoPE. Re: Llama 4, please see the sibling comment.