Remix clone Hacker News

new | show | ask | jobs Github

	▲	andy99 an hour ago
		It’s supervised learning rather than RL, you’re just training to labels. It doesn’t work (doesn’t generalize) because there is no guarantee or even expectation that any causal relationship is learned, it’s just whatever convenient pattern gets the lowest loss. There is lots of research on this for those unaware.