Curious, how did you settle on Haiku/Sonnet? Because there are much cheaper models on OpenRouter that probably perform comparatively...

Consider Haiku 4.5: $1/M input tokens | $5/M output tokens vs MiniMax M2.7: $0.30/M input tokens | $1.20/M output tokens vs Kimi K2.5: $0.45/M input tokens | $2.20/M output tokens

I haven't tried so I can't say for sure, but from personal experience, I think M2.7 and K2.5 can match Haiku and probably exceed it on most tasks, for much cheaper.

▲

lanyard-textile an hour ago | parent | next [-]

Since they're opening it publicly on irc here, the safety rails might be a consideration. I've made an agent recently and that's why I'm paying a premium to Anthropic atm -- Though I'm still experimenting to see if it's really necessary.

It's getting some organic usage -- 100M input tokens for just chats this month -- and I've seen enough users try to throw Haiku against the wall and failing to trick it into misbehaving. It "pumps the breaks" a lot and imitates annoyance when you ask it repeatedly :) Handles emotionally driven real-life questions mid-conversation well. It just works.

Not seeing all that consistently with other models I've tried so far -- but I've assumed it's not a completely fair comparison with (e.g.) open weights, since these safety rails are presumably not always arising from the natural model calls.

▲

nl 5 hours ago | parent | prev | next [-]

Xiaomi Mimo v2-Flash is fantastic.

I have a relatively hard personal agentic benchmark, and Mimo v2-Flash scores 8% higher in 109 seconds for $0.003 (0.3 cents!) vs Haiku which took 262 seconds for $0.24 (24 cents)

Gemini 3.1 Flash Lite Preview (yes that is its name) is also a solid choice.

▲

ruguo 7 hours ago | parent | prev | next [-]

MiniMax M2.7 is actually pretty solid. I’ve been using it for coding lately and it handles most tasks just fine, but Opus 4.6 is still on another level.

▲

jeremyjh 7 hours ago | parent | prev | next [-]

MiniMax's Token Plan is even less expensive and agent usage is explicitly allowed.

▲

faangguyindia 7 hours ago | parent | prev | next [-]

just use gemini flash3, it's better than haiku

	▲	0123456789ABCDE 26 minutes ago \| parent \| next [-]
		unless gp really cares about lower hallucination rates https://artificialanalysis.ai/?omniscience=omniscience-hallu...
	▲	attentive 5 hours ago \| parent \| prev [-]
		or better yet 3.1 Flash-Lite at $0.25/1M input

▲

ls612 7 hours ago | parent | prev [-]

Because this is probably paid marketing by Anthropic?