Remix clone Hacker News

new | show | ask | jobs Github

	▲	rfw300 3 hours ago
		Interesting project, but the lack of any actual benchmark results on existing models/agents is disappointing.
	▲	frabonacci 3 hours ago \| parent [-]
		Fair point - we just open-sourced this so benchmark results are coming. We're already working with labs on evals, focusing on tasks that are more realistic than OSWorld/Windows Agent Arena and curated with actual workers. If you want to run your agent on it we'd love to include your results.