▲ | arugulum 3 days ago | |
My statement was >a (fine-tuned) base Transformer model just trivially blowing everything else out of the water "Attention is All You Need" was a Transformer model trained specifically for translation, blowing all other translation models out of the water. It was not fine-tuned for tasks other than what the model was trained from scratch for. GPT-1/BERT were significant because they showed that you can pretrain one base model and use it for "everything". |