Remix.run Logo
littlestymaar 11 hours ago

May I ask why you went for a 7B and a 32B dense models instead of a small MoE like Qwen3-30B-A3B or gpt-oss-20b given how successful these MoE experiments were?

fnbr 10 hours ago | parent | next [-]

MoEs have a lot of technical complexity and aren't well supported in the open source world. We plan to release a MoE soon(ish).

I do think that MoEs are clearly the future. I think we will release more MoEs moving forward once we have the tech in place to do so efficiently. For all use cases except local usage, I think that MoEs are clearly superior to dense models.

trebligdivad 2 hours ago | parent [-]

Even local, MoE are just so much faster, and they let you pick a large/less quantized model and still get a useful speed.

riazrizvi 10 hours ago | parent | prev [-]

7B runs on my Intel Macbook Pro - there is a broad practical application served here for developers who need to figure out a project on their own hardware, which improves time/cost/effort economy. Before committing to a bigger model for the same project.