| ▲ | qoez 2 days ago | |
From a core openai insider who have likely trained very large markov models and large transformers: https://x.com/unixpickle/status/1935011817777942952 Untwittered: A Markov model and a transformer can both achieve the same loss on the training set. But only the transformer is smart enough to be useful for other tasks. This invalidates the claim that "all transformers are doing is memorizing their training data". | ||