| ▲ | lucrbvi 6 hours ago | |||||||
Some people are speculating that Opus 4.7 is distilled from Mythos due to the new tokenizer (it means Opus 4.7 is a new base model, not just an improved Opus 4.6) | ||||||||
| ▲ | aesthesia 6 hours ago | parent | next [-] | |||||||
The new tokenizer is interesting, but it definitely is possible to adapt a base model to a new tokenizer without too much additional training, especially if you're distilling from a model that uses the new tokenizer. (see, e.g., https://openreview.net/pdf?id=DxKP2E0xK2). | ||||||||
| ||||||||
| ▲ | alecco 6 hours ago | parent | prev [-] | |||||||
Yes, I was thinking that. But it could as well be the other way around. Using the pretrained 4.7 (1T?) to speed up ~70% Mythos (10T?) pretraining. It's just speculative decoding but for training. If they did at this scale it's quite an achievement because training is very fragile when doing these kinds of tricks. | ||||||||
| ||||||||