Remix clone Hacker News

new | show | ask | jobs Github

	▲	jmalicki 5 hours ago
		Unsupervised is basically clustering. Alphago is RL - winning or losing a game is a form of supervision. Unsupervised is something where there is no intrinsic reward signal. In pre training, predicting the next token and seeing that it matches is a reward signal, hence it is self supervised.