Remix.run Logo
awongh 5 days ago

This is great, but as others have mentioned the UX problem is more complicated than this:

- for other models there are providers that serve the same model with different prices

- each provider optimizes for different parameters: speed, cost, etc.

- the same model can still be different quantizations

- some providers offer batch pricing (e.g., Grok API does not)

And there are plenty of other parameters to filter over- thinking vs. non-thinking, multi-modal or not, etc. not to even mention benchmarks ranking.

https://artificialanalysis.ai gives a blended cost number which helps with sorting a bit, but a blended cost model for input/output costs are going to change depending on what you're doing.

I'm still holding my breath for a site that has a really nice comparison UI.

Someone please build it!

numlocked 5 days ago | parent | next [-]

(I work at OpenRouter)

We have a simple model comparison tool that is not-at-all-obvious to find on the website, but hopefully can help somewhat. E.g.

https://openrouter.ai/compare/qwen/qwen3-coder/moonshotai/ki...

pzo 4 days ago | parent [-]

but this is is not much user friendly unless you already know what models you want to compare. I would prefer I switch some toggles or make kind of a query what kind of models I'm looking for my use case and then sort by speed or price at the end e.g query:

I want model:

1) with audio input

2) minimum 50 tps speed

3) max price less than $1 input and less than $3 output

4) need to support english only or need to support polish etc.

sort by WER or some benchmark, dispaly charts etc.

edit:

extra bonus if I can tell how big typical my prompt will be like 20 seconds audio ant it will figure out how many tokens it will be because e.g. gemini 2.0 flash hide it very deep that its supposed to be 32 tokens per 1 second. Same hard to find how many tokens is for image input sometimes. Would be good where I can also attach some text or sample input for a query to do the calculation

Ideally write such prompt and it show me results or map this prompt to SQL that executes in your data so I can tweak SQL query on website. It doesn't have to be SQL can be some other simple but deterministic query language.

zeroCalories 5 days ago | parent | prev | next [-]

I think it would be very hard to make a fair comparison. Best you could do is probably make the trade-offs clear and let people make their own choices. I think it could be cool to make something like a token exchange where people put up their requirements, and then companies offer competing services that fit those requirements. Would be cool to let random people offer to their compute, but you would need to find a way to handle people lying about their capabilities or stealing data.

alexellman 5 days ago | parent | prev | next [-]

would a column for "provider" meaning the place you are actually making the call to solve this

svachalek 5 days ago | parent | prev [-]

Please not benchmark ranking. We've encouraged this nonsense far too long already.