| ▲ | benleejamin 10 hours ago |
| For anyone who was wondering about Mythos release plans: > What we learn from the real-world deployment of these safeguards will help us work towards our eventual goal of a broad release of Mythos-class models. |
|
| ▲ | msp26 9 hours ago | parent | next [-] |
| They don't have the compute to make Mythos generally available: that's all there is to it. The exclusivity is also nice from a marketing pov. |
| |
| ▲ | alecco 9 hours ago | parent | next [-] | | They don't have demand for the price it would require for inference. They are definitely distilling it into a much smaller model and ~98% as good, like everybody does. | | |
| ▲ | lucrbvi 9 hours ago | parent | next [-] | | Some people are speculating that Opus 4.7 is distilled from Mythos due to the new tokenizer (it means Opus 4.7 is a new base model, not just an improved Opus 4.6) | | |
| ▲ | aesthesia 9 hours ago | parent | next [-] | | The new tokenizer is interesting, but it definitely is possible to adapt a base model to a new tokenizer without too much additional training, especially if you're distilling from a model that uses the new tokenizer. (see, e.g., https://openreview.net/pdf?id=DxKP2E0xK2). | | |
| ▲ | ACCount37 7 hours ago | parent [-] | | Not impossible, but you have to be at least a little bit mad to deploy tokenizer replacement surgery at this scale. They also changed the image encoder, so I'm thinking "new base model". Whatever base that was powering 4.5/4.6 didn't last long then. |
| |
| ▲ | alecco 9 hours ago | parent | prev [-] | | Yes, I was thinking that. But it could as well be the other way around. Using the pretrained 4.7 (1T?) to speed up ~70% Mythos (10T?) pretraining. It's just speculative decoding but for training. If they did at this scale it's quite an achievement because training is very fragile when doing these kinds of tricks. | | |
| ▲ | ACCount37 9 hours ago | parent [-] | | Reverse distillation. Using small models to bootstrap large models. Get richer signal early in the run when gradients are hectic, get the large model past the early training instability hell. Mad but it does work somewhat. Not really similar to speculative decoding? I don't think that's what they've done here though. It's still black magic, I'm not sure if any lab does it for frontier runs, let alone 10T scale runs. |
|
| |
| ▲ | baq 9 hours ago | parent | prev | next [-] | | > They don't have demand for the price it would require for inference. citation needed. I find it hard to believe; I think there are more than enough people willing to spend $100/Mtok for frontier capabilities to dedicate a couple racks or aisles. | |
| ▲ | systemsweird 8 hours ago | parent | prev [-] | | [dead] |
| |
| ▲ | CodingJeebus 9 hours ago | parent | prev [-] | | I've read so many conflicting things about Mythos that it's become impossible to make any real assumptions about it. I don't think it's vaporware necessarily, but the whole "we can't release it for safety reasons" feels like the next level of "POC or STFU". |
|
|
| ▲ | shostack 9 hours ago | parent | prev | next [-] |
| Looks like they are adding Peter Thiel backed ID verification too. https://reddit.com/r/ClaudeAI/comments/1smr9vs/claude_is_abo... |
| |
| ▲ | szmarczak 9 hours ago | parent [-] | | You should've commented this on the parent thread for visibility, I had to scroll to find this, as I don't browse r/ClaudeAI regularly. |
|
|
| ▲ | not_ai 10 hours ago | parent | prev | next [-] |
| Oh look it was too powerful to release, now it’s just a matter of safeguards. This story sounds a lot like GPT2. |
| |
| ▲ | tabbott 9 hours ago | parent | next [-] | | The original blog post for Mythos did lay out this safeguard testing strategy as part of their plan. | |
| ▲ | hgoel 9 hours ago | parent | prev | next [-] | | This seems needlessly cynical. I don't think they said they never planned to release it. They seemed to make it clear that they expect other labs to reach that level sooner or later, and they're just holding it off until they've helped patch enough vulnerabilities. | |
| ▲ | camdenreslink 9 hours ago | parent | prev | next [-] | | My guess is that it is just too expensive to make generally available. Sounds similar to ChatGPT 4.5 which was too expensive to be practical. | |
| ▲ | poszlem 10 hours ago | parent | prev [-] | | It's too powerful now. Once GPT6 is released it will suddenly, magically, become not too powerful to release. | | |
| ▲ | latentsea 9 hours ago | parent | next [-] | | For a second there I read that as 'GTA 6', and that got me thinking maybe the reason GTA 6 hasn't come out all of these years is because of how dangerous and powerful it's going to be. | | |
| ▲ | mrbombastic 9 hours ago | parent [-] | | productivity going right back down again, ah well they weren't going to pay us more anyway |
| |
| ▲ | thomasahle 9 hours ago | parent | prev [-] | | Or, you know, they will have improved the safe guards | | |
|
|
|
| ▲ | jampa 9 hours ago | parent | prev | next [-] |
| Mythos release feels like Silicon Valley "don't take revenue" advice: https://www.youtube.com/watch?v=BzAdXyPYKQo ""If you show the model, people will ask 'HOW BETTER?' and it will never be enough. The model that was the AGI is suddenly the +5% bench dog. But if you have NO model, you can say you're worried about safety! You're a potential pure play... It's not about how much you research, it's about how much you're WORTH. And who is worth the most? Companies that don't release their models!" |
| |
| ▲ | CodingJeebus 9 hours ago | parent [-] | | Completely agree. We're at this place where a frontier model's peak perceived value always seems to be right before it releases. | | |
|
|
| ▲ | frank-romita 10 hours ago | parent | prev [-] |
| The most highly anticipated model looking forward to using it |