| ▲ | hobofan 2 days ago | ||||||||||||||||||||||
> The Gemini API will randomly fail about 1% of the time, retry logic is basically mandatory. That is sadly true across the board for AI inference API providers. OpenAI and Anthropic API stability usually suffers around launch events. Azure OpenAI/Foundry serving regularly has 500 errors for certain time periods. For any production feature with high uptime guarantees I would right now strongly advise for picking a model you can get from multiple providers and having failover between clouds. | |||||||||||||||||||||||
| ▲ | downsplat 2 days ago | parent [-] | ||||||||||||||||||||||
Yeah at $WORK we use various LLM APIs to analyze text; it's not heavy usage in terms of tokens but maybe 10K calls per day. We've found that response times vary a lot, sometimes going over a minute for simple tasks, and random fails happen. Retry logic is definitely mandatory, and it's good to have multiple providers ready. We're abstracting calls across three different APIs (openai, gemini and mistral, btw we're getting pretty good results with mistral!) so we can switch workloads quickly if needed. | |||||||||||||||||||||||
| |||||||||||||||||||||||