It was notably trained with Muon optimizer for what it's worth, but I don't know how much can be attributed to that alone