Remix.run Logo
XenophileJKO an hour ago

I'm still kind of surprised that people are targeting edge deployment of MoE models. By definition they optimize for computation cost at the expense of memory efficiency. We generally need the opposite on the edge.

I'm hoping to see more work in the other direction with cyclic/looped transformers and other memory dense approaches.