| ▲ | embedding-shape 2 days ago | |||||||
Every time someone brings up that, it brings me back memories of trying to frantically finish stuff as quickly as possible as either my quota slowly go down with each API request, or the pay-as-you-go bill is increasing 0.1% for each request. Nowadays I fire off async jobs that involve 1000s of requests, billion of tokens, yet it costs basically the same as if I didn't. Maybe it takes a different type of person, than the one I am, but all these "pay-as-you-go"/tokens/credits platforms make me nervous to use, and I end up not using it or spending time trying to "optimize", while investing in hardware and infrastructure I can run at home and use that seems to be no problem for my head to just roll with. | ||||||||
| ▲ | noname120 2 days ago | parent [-] | |||||||
But the downside is that you are stuck with inferior LLMs. None of the best models have open weights: Gemini 3.5, Claude Sonnet/Opus 4.5, ChatGPT 5.2. The best model with open weights performs an order of magniture worse than those. | ||||||||
| ||||||||