There are plenty of 3rd party and big cloud options to run these models by the hour or token. Big models really only work in that context, and that’s ok. Or you can get yourself an H100 rack and go nuts, but there is little downside to using a cloud provider on a per-token basis.

▲

cubefox 3 hours ago | parent [-]

> There are plenty of 3rd party and big cloud options to run these models by the hour or token.

Which ones? I wanted to try a large base model for automated literature (fine-tuned models are a lot worse at it) but I couldn't find a provider which makes this easy.

▲

reilly3000 21 minutes ago | parent | next [-]

If you’re already using GCP, Vertex AI is pretty good. You can run lots of models on it:

https://docs.cloud.google.com/vertex-ai/generative-ai/docs/m...

Lambda.ai used to offer per-token pricing but they have moved up market. You can still rent a B200 instance for sub $5/hr which is reasonable for experimenting with models.

https://app.hyperbolic.ai/models Hyperbolic offers both GPU hosting and token pricing for popular OSS models. It’s easy with token based options because usually are a drop-in replacement for OpenAI API endpoints.

You have you rent a GPU instance if you want to run the latest or custom stuff, but if you just want to play around for a few hours it’s not unreasonable.

	▲	verdverm 15 minutes ago \| parent [-]
		GCloud and Hyperbolic have been my go-to as well

▲

big_man_ting 40 minutes ago | parent | prev [-]

have you checked OpenRouter if they offer any providers who serve the model you need?