Remix clone Hacker News

new | show | ask | jobs Github

	▲	meroes a year ago
		Aren’t prompts seeking to offload reasoning though? Is that really a fair data point for this?
	▲	vidarh a year ago \| parent [-]
		When people are claiming they can't reason, then yes, benchmarking against average human should be a bare minimum. Arguably they should benchmark against below-average humans too, because the bar where we'd be willing to argue that a human can't reason is very low. If you're testing to see whether it can replace certain types of work, then it depends on where you would normally set the bar for that type of work. You could offload a whole lot of work with something that can reliably reason at below an average human.