Remix.run Logo
marmarama a day ago

It's really just a performance tradeoff, and where your acceptable performance level is.

Ollama, for example, will let you run any available model on just about any hardware. But using the CPU alone is _much_ slower than running it on any reasonable GPU, and obviously CPU performance varies massively too.

You can even run models that are bigger than available RAM too, but performance will be terrible.

The ideal case is to have a fast GPU and run a model that fits entirely within the GPU's memory. In these cases you might measure the model's processing speed in tens of tokens per second.

As the idealness decreases, the processing speed decreases. On a CPU only with a model that fits in RAM, you'd be maxing out in the low single digit tokens per second, and on lower performance hardware, you start talking about seconds over token instead. If the model does not fit in RAM, then the measurement is minutes per token.

For most people, their minimum acceptable performance level is in the double digit tokens per second range, which is why people optimize for that with high-end GPUs with as much memory as possible, and choose models that fit inside the GPU's RAM. But in theory you can run large models on a potato, if you're prepared to wait until next week for an answer.

mark_l_watson a day ago | parent [-]

+1

> It's really just a performance tradeoff, and where your acceptable performance level is.

I am old enough to remember developers respecting the economics of running the software they create.

Ollama running locally paired occasionally with using Ollama Cloud when required is a nice option if you use it enough. I have twice signed up and paid $20/month for Ollama Cloud, love the service, but use it so rarely (because local models so often are sufficient) that I cancelled both times.

If Ollama ever implements a pay as you go API for Ollama Cloud, then I will be a long term customer. I like the business model of OpenRouter but I enjoy using Ollama Cloud more.

I am probably in the minority, but I wish subscription plans would go away and Claude Code, gemini-cli, codex, etc. would all be only available pay as you go, with ‘anti dumping’ laws applied to running unsustainable businesses.

I don’t mean to pick on OpenAI, but I think the way they fund their operations actually helps threaten the long term viability of our economy. Our government making the big all-in bet on AI dominance seems crazy to me.