| ▲ | irthomasthomas 5 hours ago | |
It means for 4.7 they trained a new base model with different architecture, different pre-training data (later knowledge cutoff), and a new tokenizer. Vs finetuning an existing model, which was the case for 4.6, and probably for 4.8. | ||
| ▲ | dominotw 3 hours ago | parent | next [-] | |
do you mean pre training? so 4.8 is just post training of an old pretrained model? btw where do they tell you how they trained the model. | ||
| ▲ | 3 hours ago | parent | prev [-] | |
| [deleted] | ||