▲ | iNic 4 days ago | |||||||
At the time getting complete sentences was extremely difficult! N-gram models were essentially the best we had | ||||||||
▲ | albertzeyer 4 days ago | parent | next [-] | |||||||
No, it was not difficult at all. I really wonder why they have such a bad example here for GPT1. See for example this popular blog post: https://karpathy.github.io/2015/05/21/rnn-effectiveness/ That was in 2015, with RNN LMs, which are all much much weaker in that blog post compared GPT1. And already looking at those examples in 2015, you could maybe see the future potential. But no-one was thinking that scaling up would work as effective as it does. 2015 is also by far not the first time where we had such LMs. Mikolov has done RNN LMs since 2010, or Sutskever in 2011. You might find even earlier examples of NN LMs. (Before that, state-of-the-art was mostly N-grams.) | ||||||||
| ||||||||
▲ | macleginn 4 days ago | parent | prev [-] | |||||||
Ngram models had been superceded by RNNs by that time. RNNs struggled with long-range dependencies, but useful ngrams were essentially capped at n=5 because of sparsity, and RNNs could do better than that. |