Remix.run Logo
DoctorOetker 4 days ago

regarding "supervised", it is a bit of a small nuance.

Traditional "supervised" training, required the dataset to be annotated with labels (good/bad, such-and-such a bounding box in an image, ...) which cost a lot of human labor to produce.

When people speak of "unsupervised" training, I actually consider it a misnomer: its historically grown, and the term will not go away quickly, but a more apt name would have been "label-free" training.

For example consider a corpus of human written text (books, blogs, ...) without additional labels (verb annotations, subject annotations, ...).

Now consider someone proposing to use next-token prediction, clearly it doesn't require additional labeling. Is it supervised? Nobody calls it supervised under the current convention, but actually one may view next-token prediction on a bare text corpus as a trick to turn an unlabeled dataset into trillions of supervised prediction tasks. Given this N-gram of preceding tokens, what does the model predict as the next token? And what does the corpus actually say as next token? Lets use this actual next token as if it were a "supervised" (labeled) exercise.

cubefox 4 days ago | parent [-]

That's also why LeCun promoted the term "self-supervised" a while ago, with some success.