| ▲ | jmalicki 5 hours ago | |
Unsupervised is basically clustering. Alphago is RL - winning or losing a game is a form of supervision. Unsupervised is something where there is no intrinsic reward signal. In pre training, predicting the next token and seeing that it matches is a reward signal, hence it is self supervised. | ||