Remix.run Logo
lkm0 2 hours ago

I'm a bit out of the loop, but do we have some grasp on the size of these closed models? Is the trick still adding an order of magnitude to weights and training data or has something changed?

m_w_ 2 hours ago | parent [-]

I think Mythos is rumored to be ~10T parameters, so in this case I think the answer is yes, although I'm sure MoE, looped models, etc play a role in the improvements as well.