| ▲ | irthomasthomas 5 hours ago |
| Given that 4.7 was a brand new model, trained from scratch with a unique architecture and tokenization scheme, I don't see the same pattern. It seems arbitrary. |
|
| ▲ | dominotw 5 hours ago | parent [-] |
| i dont understand the nuances here. what does this mean. 4.8 is trained on same model as previous one then? what does brand new mean. |
| |
| ▲ | irthomasthomas 5 hours ago | parent [-] | | It means for 4.7 they trained a new base model with different architecture, different pre-training data (later knowledge cutoff), and a new tokenizer.
Vs finetuning an existing model, which was the case for 4.6, and probably for 4.8. | | |
| ▲ | dominotw 2 hours ago | parent | next [-] | | do you mean pre training? so 4.8 is just post training of an old pretrained model? btw where do they tell you how they trained the model. | |
| ▲ | 3 hours ago | parent | prev [-] | | [deleted] |
|
|