Remix clone Hacker News

new | show | ask | jobs Github

	▲	causal 2 days ago
		That ARC AGI score is a little suspicious. That's a really tough for AI benchmark. Curious if there were improvements to the test harness because that's a wild jump in general problem solving ability for an incremental update.
	▲	woeirua 2 days ago \| parent \| next [-]
		They're clearly building better training datasets and doing extensive RL on these benchmarks over time. The out of distribution performance is still awful.
	▲	taurath 2 days ago \| parent \| prev [-]
		I don’t think their words mean just about anything, only the behavior of the models. Still waiting of Full Self Driving myself.