Every time someone brings up that, it brings me back memories of trying to frantically finish stuff as quickly as possible as either my quota slowly go down with each API request, or the pay-as-you-go bill is increasing 0.1% for each request.

Nowadays I fire off async jobs that involve 1000s of requests, billion of tokens, yet it costs basically the same as if I didn't.

Maybe it takes a different type of person, than the one I am, but all these "pay-as-you-go"/tokens/credits platforms make me nervous to use, and I end up not using it or spending time trying to "optimize", while investing in hardware and infrastructure I can run at home and use that seems to be no problem for my head to just roll with.

▲

noname120 2 days ago | parent [-]

But the downside is that you are stuck with inferior LLMs. None of the best models have open weights: Gemini 3.5, Claude Sonnet/Opus 4.5, ChatGPT 5.2. The best model with open weights performs an order of magniture worse than those.

	▲	embedding-shape 2 days ago \| parent [-]
		The best weights are the weights you can train yourself for specific use cases. As long as you have the data and the infrastructure to train/fine-tune your own small models, you'll get drastically better results. And just because you're mostly using local models doesn't mean you can't use API hosted models in specific contexts. Of course, then the same dread sets in, but if you can do 90% of the tokens with local models and 10% with pay-per-usage API hosted models, you get the best of both worlds.