Remix clone Hacker News

new | show | ask | jobs Github

	▲	mountainriver 4 hours ago
		Why not use this instead of KL in reinforcement learning?
	▲	anvuong 2 hours ago \| parent [-]
		JSD is just symmetrized KL, it's the forward KL + reverse KL. In reinforcement learning, usually what we want is to find the optimal action, i.e. action that maximizes the reward, this translates to the so-called "mode-seeking" optimization, which is the reverse KL.