Remix.run Logo
brikym 2 days ago

It's all trade-offs between price, speed and accuracy. It's no good using a free model when the latency is 10s+ and the throughput is sub 100token/s and this is often the case on OpenRouter. I have to use a speedy provider like Groq and a small model. Dumber models need a lot more context to correct the inaccuracies. I'm mostly using mid tier models like Gemini 3 flash to generate the boards and then I use the fastest models to answer questions (currently gpt-oss-120b on Groq).