Remix.run Logo
msp26 2 days ago

They don't have the compute to make Mythos generally available: that's all there is to it. The exclusivity is also nice from a marketing pov.

alecco 2 days ago | parent | next [-]

They don't have demand for the price it would require for inference.

They are definitely distilling it into a much smaller model and ~98% as good, like everybody does.

lucrbvi 2 days ago | parent | next [-]

Some people are speculating that Opus 4.7 is distilled from Mythos due to the new tokenizer (it means Opus 4.7 is a new base model, not just an improved Opus 4.6)

aesthesia 2 days ago | parent | next [-]

The new tokenizer is interesting, but it definitely is possible to adapt a base model to a new tokenizer without too much additional training, especially if you're distilling from a model that uses the new tokenizer. (see, e.g., https://openreview.net/pdf?id=DxKP2E0xK2).

ACCount37 2 days ago | parent [-]

Not impossible, but you have to be at least a little bit mad to deploy tokenizer replacement surgery at this scale.

They also changed the image encoder, so I'm thinking "new base model". Whatever base that was powering 4.5/4.6 didn't last long then.

alecco 2 days ago | parent | prev [-]

Yes, I was thinking that. But it could as well be the other way around. Using the pretrained 4.7 (1T?) to speed up ~70% Mythos (10T?) pretraining.

It's just speculative decoding but for training. If they did at this scale it's quite an achievement because training is very fragile when doing these kinds of tricks.

ACCount37 2 days ago | parent [-]

Reverse distillation. Using small models to bootstrap large models. Get richer signal early in the run when gradients are hectic, get the large model past the early training instability hell. Mad but it does work somewhat.

Not really similar to speculative decoding?

I don't think that's what they've done here though. It's still black magic, I'm not sure if any lab does it for frontier runs, let alone 10T scale runs.

baq 2 days ago | parent | prev | next [-]

> They don't have demand for the price it would require for inference.

citation needed. I find it hard to believe; I think there are more than enough people willing to spend $100/Mtok for frontier capabilities to dedicate a couple racks or aisles.

systemsweird 2 days ago | parent | prev [-]

[dead]

CodingJeebus 2 days ago | parent | prev [-]

I've read so many conflicting things about Mythos that it's become impossible to make any real assumptions about it. I don't think it's vaporware necessarily, but the whole "we can't release it for safety reasons" feels like the next level of "POC or STFU".