Remix.run Logo
lostmsu 7 hours ago

Have you actually tried training a 10MB MoE (that would train in a few days on 3090)?

I came to an opinion that most of the current AI research can be easily reproduced on the small scale. CoT is possibly the only exception as it sounds like it requires certain emergent behavior, but even there I am not sure it is impossible to retrofit to tiny models.