Remix clone Hacker News

new | show | ask | jobs Github

	▲	Alifatisk 4 hours ago
		From Qwen-3-max thinking, I remember the inference becoming veeery slow as you pushed towards 1M context, already at 300k tokens you would notice the degradation. But of course, I was using Qwen Chat, so could be a resource allocation thing.