Remix.run Logo
InputName 6 hours ago

Looks at first graph. It's SWE-Bench Verified. A benchmark Open-AI stopped using two months ago due to contamination.

Doesn't look to promising. Is there any reason to consider Mistral other than it's not US?

2ndorderthought 4 hours ago | parent | next [-]

They did not stop using it due to contamination. They said it's flawed and indirectly said anthropics results were impossible. It's very possible they are sore losers

tpurves 6 hours ago | parent | prev | next [-]

If it's not US and it's within a few percent of SOTA that might be good enough for a lot of people (eg Europeans)

NitpickLawyer 6 hours ago | parent [-]

Gemma has been better for us at EU languages than mistral (for comparable sized models) :/ so ... dunno. What mistral does well and others are lagging behind is deploying on prem with their engineers and know-how, offering tuned models for your tasks and finetuning on your own data. (I expect google to start offering this next)

deaux 5 hours ago | parent [-]

It's sad that despite their strength in this for onprem, they're so behind on this in the cloud. No publicly available cloud SFT at all. Meanwhile Google has been offering that for years - though remains to be seen if they will for Gemini 3 when GA.

And on top of it a range of providers like Fireworks and so on that offer it for Chinese models. This seems such an obvious thing for Mistral to offer.

amunozo 6 hours ago | parent | prev [-]

Price and speed.