Remix clone Hacker News

new | show | ask | jobs Github

	▲	xandrius 5 hours ago
		I think people are misunderstanding reward functions and LLMs. LLMs don't actually have a reward system like some other ML models.
	▲	storus 4 hours ago \| parent [-]
		They are trained with one, and when you look at DPO you can say they contain an implicit one as well.