Remix.run Logo
lambda 2 hours ago

So, this is all true, but this calculation isn't that nuanced. It's trying to get you into a ballpark range, and based on my usage on my real hardware (if I put in my specs, since it's not in their hardware list), the results are fairly close to my real experience if I compensate for the issue where it's calculating based on total params instead of active.

So by doing so, this calculator is telling you that you should be running entirely dense models, and sparse MoE models that maybe both faster and perform better are not recommended.

littlestymaar 2 hours ago | parent [-]

I agree, and I even started my response expressing my agreement with the whole point.

But since this is a tech forum, I assumed some people would be interested by the correction on the details that were wrong.