Remix.run Logo
victorbjorklund 5 hours ago

Notice that MOE isn’t different experts for different types of problems. It’s per token and not really connect to problem type.

So if you send a python code then the first one in function can be one expert, second another expert and so on.

dotancohen 2 hours ago | parent [-]

Can you back this up with documentation? I don't believe that this is the case.

pixelmelt an hour ago | parent [-]

Check out Unsloths REAP models, you can outright delete a few of the lesser used experts without the model going braindead since they all can handle each token but some are better posed to do so.