Remix clone Hacker News

new | show | ask | jobs Github

	▲	Retr0id 3 hours ago
		It's not even an anthropomorphization, the reward function in RLHF-like scenarios is usually quite literally "did the user think the output was good"