Remix.run Logo
sireat 4 days ago

Basically it boils down that for most queries google/gemini-2.5-flash is the workhorse fast/cheap/good enough.

Add in multimodality, 1M context and it is such a Swiss army knife.

It is cheap and performant enough to run 100k queries. (Took a bit over a day and cost around 30 Euros for a major document classification task). Yes in theory this could have been done with fine-tuned BERT or maybe even with some older methods but it saved way too much time.

There is another factor that may explain why Flash is #1 in most categories on OpenRouter - Flash has gotten reasonably decent at less common human languages.

Most cheap (including Flash Lite) and local models mostly have English focused training.

karmakaze 4 days ago | parent | next [-]

This was my initial assessment as well. Also note:

> Grok I forgot about until it was too late.

I was surprised by how much I prefer Grok to others. Even its persona is how I prefer it, detailed without volunteering unwanted information or sycophanty. In general I'd use Grok-3 more than 4 which is good enough for common uses.

I suspect that Claude would be best, only if I gave it a long complex task with enough instructions up front so it could grind away on it while I was doing something else and not waiting on it.

vjerancrnjak 4 days ago | parent | prev [-]

How do you run so many, I’m constantly exhausting the resources can’t even concurrently call 20 times?

sireat 4 days ago | parent [-]

While I do have multiple OpenRouter accounts(personal and organizational) I did not even look into concurrent calls - it was sequential.

The job was set on Friday and ready on Monday. On average it was about 5k tokens (documents ranging from 1k to 200k in size) and only about 10 tokens out.

Average response was about 1.5 seconds ~ 40 hours for full set.

I really did some heavy prompt testing to limit output.

Even then every few thousand queries you'd get some double token responses. That is Gemini would respond in duplicate - ie Daisy Daisy.