| ▲ | xscott 5 hours ago | ||||||||||||||||
Of course I can't be certain, but I think the "mixture of experts" design plays into it too. Metaphorically, there's a mid-level manager who looks at your prompt and tries to decide which experts it should be sent to. If he thinks you won't notice, he saves money by sending it to the undergraduate intern. Just a theory. | |||||||||||||||||
| ▲ | victorbjorklund 5 hours ago | parent [-] | ||||||||||||||||
Notice that MOE isn’t different experts for different types of problems. It’s per token and not really connect to problem type. So if you send a python code then the first one in function can be one expert, second another expert and so on. | |||||||||||||||||
| |||||||||||||||||