| ▲ | fariszr 5 days ago | |||||||
These flash models keep getting more expensive with every release. Is there an OSS model that's better than 2.0 flash with similar pricing, speed and a 1m context window? Edit: this is not the typical flash model, it's actually an insane value if the benchmarks match real world usage. > Gemini 3 Flash achieves a score of 78%, outperforming not only the 2.5 series, but also Gemini 3 Pro. It strikes an ideal balance for agentic coding, production-ready systems and responsive interactive applications. The replacement for old flash models will be probably the 3.0 flash lite then. | ||||||||
| ▲ | thecupisblue 5 days ago | parent | next [-] | |||||||
Yes, but the 3.0 Flash is cheaper, faster and better than 2.5 Pro. So if 2.5 Pro was good for your usecase, you just got a better model for about 1/3rd of the price, but might hurt the wallet a bit more if you use 2.5 Flash currently and want an upgrade - which is fair tbh. | ||||||||
| ||||||||
| ▲ | aoeusnth1 5 days ago | parent | prev | next [-] | |||||||
I think it's good, they're raising the size (and price) of flash a bit and trying to position Flash as an actually useful coding / reasoning model. There's always lite for people who want dirt cheap prices and don't care about quality at all. | ||||||||
| ▲ | sosodev 5 days ago | parent | prev | next [-] | |||||||
Nvidia released Nemotron 3 nano recently and I think it fits your requirements for an OSS model: https://huggingface.co/nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B... It's extremely fast on good hardware, quite smart, and can support up to 1m context with reasonable accuracy | ||||||||
| ||||||||
| ▲ | mips_avatar 5 days ago | parent | prev | next [-] | |||||||
For my apps evals Gemini flash and grok 4 fast are the only ones worth using. I'd love for an open weights model to compete in this arena but I haven't found one. | ||||||||
| ▲ | scrollop 5 days ago | parent | prev | next [-] | |||||||
This one is more powerful than openai models, including gpt 5.2 (which is worse on various benchmarks than 5.1 which is worse than 5.1, and that's where 5.2 was using XHIGH, whiulst the others were on high eg: https://youtu.be/4p73Uu_jZ10?si=x1gZopegCacznUDA&t=582 ) | ||||||||
| ▲ | fullstackwife 5 days ago | parent | prev [-] | |||||||
cost of e2e task resolution should be cheaper, even if single inference cost is higher, you need fewer loops to solve a problem now | ||||||||
| ||||||||