| ▲ | CSMastermind 2 days ago | |||||||||||||||||||||||||||||||
Some other fun things you'll find: - The models perform differently when called via the API vs in the Gemini UI. - The Gemini API will randomly fail about 1% of the time, retry logic is basically mandatory. - API performance is heavily influenced by the whims of the Google we've observed spreads between 30 seconds and 4 minutes for the same query depending on how Google is feeling that day. | ||||||||||||||||||||||||||||||||
| ▲ | hobofan 2 days ago | parent | next [-] | |||||||||||||||||||||||||||||||
> The Gemini API will randomly fail about 1% of the time, retry logic is basically mandatory. That is sadly true across the board for AI inference API providers. OpenAI and Anthropic API stability usually suffers around launch events. Azure OpenAI/Foundry serving regularly has 500 errors for certain time periods. For any production feature with high uptime guarantees I would right now strongly advise for picking a model you can get from multiple providers and having failover between clouds. | ||||||||||||||||||||||||||||||||
| ||||||||||||||||||||||||||||||||
| ▲ | specproc 2 days ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||
I have also had some super weird stuff in my output (2.5-flash). I'm passing docs for bulk inference via Vertex, and a small number of returned results will include gibberish in Japanese. | ||||||||||||||||||||||||||||||||
| ||||||||||||||||||||||||||||||||
| ▲ | halflings 2 days ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||
"The models perform differently when called via the API vs in the Gemini UI." This shouldn't be surprised, e.g. the model != the product. The same way GPT4o behaves differently than the ChatGPT product when using GPT4o. | ||||||||||||||||||||||||||||||||
| ▲ | akhilnchauhan 2 days ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||
> The models perform differently when called via the API vs in the Gemini UI. This difference between API vs UI responses being different is common across all the big players (Claude, GPT models, etc.) The consumer chat interfaces are designed for a different experience than a direct API call, even if pinging the same model. | ||||||||||||||||||||||||||||||||
| ▲ | DANmode 2 days ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||
So, not something for a production app yet. | ||||||||||||||||||||||||||||||||
| ▲ | ianberdin 2 days ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||
Even funnier, when Pro 3 answers to a previous message in my chat. Just making a duplicate answer with different words. Retry helps, but… | ||||||||||||||||||||||||||||||||
| ||||||||||||||||||||||||||||||||
| ▲ | te_chris 2 days ago | parent | prev [-] | |||||||||||||||||||||||||||||||
The way the models behave in Vertex AI Studio vs the API is unforgivable. Totally different. | ||||||||||||||||||||||||||||||||