Remix.run Logo
jeroenhd 2 hours ago

With current GPU prices, I find it difficult to find hardware to run competent models. gemma4's 26B MoE model seems to offer the best performance per megabyte of RAM, but it's not good enough to use the way one would use cloud models.

The big, impressive models all scale well for multi-customer setups because of the efficiency batching provides, but the base cost to run models like that as even a small business is incredibly high. If you can't saturate your LLM hardware almost 24/7, the time to earn back your investment is high unless you choose inferior models that are worse at their job.