These flash models keep getting more expensive with every release.

Is there an OSS model that's better than 2.0 flash with similar pricing, speed and a 1m context window?

Edit: this is not the typical flash model, it's actually an insane value if the benchmarks match real world usage.

> Gemini 3 Flash achieves a score of 78%, outperforming not only the 2.5 series, but also Gemini 3 Pro. It strikes an ideal balance for agentic coding, production-ready systems and responsive interactive applications.

The replacement for old flash models will be probably the 3.0 flash lite then.

▲

thecupisblue 5 days ago | parent | next [-]

Yes, but the 3.0 Flash is cheaper, faster and better than 2.5 Pro.

So if 2.5 Pro was good for your usecase, you just got a better model for about 1/3rd of the price, but might hurt the wallet a bit more if you use 2.5 Flash currently and want an upgrade - which is fair tbh.

	▲	mark_l_watson 4 days ago \| parent [-]
		I agree, adding one point: a better model can in effect use fewer tokens if you get a higher percentage of successful one-shots to work. I am a ‘retired gentleman scientist’ so take this with a grain of salt (I do a lot of non-commercial, non-production experiments): when I watch the output for tool use, better models have fewer tool ‘re-tries.’

▲

aoeusnth1 5 days ago | parent | prev | next [-]

I think it's good, they're raising the size (and price) of flash a bit and trying to position Flash as an actually useful coding / reasoning model. There's always lite for people who want dirt cheap prices and don't care about quality at all.

▲

sosodev 5 days ago | parent | prev | next [-]

Nvidia released Nemotron 3 nano recently and I think it fits your requirements for an OSS model: https://huggingface.co/nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B...

It's extremely fast on good hardware, quite smart, and can support up to 1m context with reasonable accuracy

	▲	mark_l_watson 4 days ago \| parent [-]
		I second this: I have spent about five hours this week experimenting with Nemotron 3 nano for both tool use and code analysis: it is excellent! and fast! Relevant to the linked Google blog: I feel like getting Nemotron 3 nano and Gemini 3 flash in one week is an early Christmas gift. I have lived with the exponential improvements in practical LLM tools over the last three years, but this week seems special.

▲

mips_avatar 5 days ago | parent | prev | next [-]

For my apps evals Gemini flash and grok 4 fast are the only ones worth using. I'd love for an open weights model to compete in this arena but I haven't found one.

▲

scrollop 5 days ago | parent | prev | next [-]

This one is more powerful than openai models, including gpt 5.2 (which is worse on various benchmarks than 5.1 which is worse than 5.1, and that's where 5.2 was using XHIGH, whiulst the others were on high eg: https://youtu.be/4p73Uu_jZ10?si=x1gZopegCacznUDA&t=582 )

https://epoch.ai/benchmarks/simplebench

▲

fullstackwife 5 days ago | parent | prev [-]

cost of e2e task resolution should be cheaper, even if single inference cost is higher, you need fewer loops to solve a problem now

	▲	fariszr 5 days ago \| parent [-]
		Sure, but for simple tasks that require a large context window, aka the typical usecase for 2.0 flash, it's still significantly more expensive.