Remix.run Logo
bugglebeetle an hour ago

Unfortunately, this looks to only cover the larger MoE models. I imagine the smaller models are what most people would target. 9B just dropped two days ago, so not surprised it’s not explicitly documented, but does use a hybrid mamba architecture that I expect needs some special consideration.