Remix clone Hacker News

new | show | ask | jobs Github

	▲	ACCount37 3 hours ago
		None whatsoever. It's a "let's find a task humans are decent at, but modern AIs are still very bad at" kind of adversarial benchmark. The exact coverage of this one is: spatial reasoning across multiple turns, agentic explore/exploit with rule inference and preplanning. Directly targeted against the current generation of LLMs.