Remix.run Logo
constantcrying 8 hours ago

The lack of the comparison (which absolutely was done), tells you exactly what you need to know.

bildung 7 hours ago | parent | next [-]

I think people from the US often aren't aware how many companies from the EU simply won't risk losing their data to the providers you have in mind, OpenAI, Anthropic and Google. They simply are no option at all.

The company I work for for example, a mid-sized tech business, currently investigates their local hosting options for LLMs. So Mistral certainly will be an option, among the Qwen familiy and Deepseek.

Mistral is positioning themselves for that market, not the one you have in mind. Comparing their models with Claude etc. would mean associating themselves with the data leeches, which they probably try to avoid.

adam_patarino 5 hours ago | parent | next [-]

We're seeing the same thing for many companies, even in the US. Exposing your entire codebase to an unreliable third party is not exactly SOC / ISO compliant. This is one of the core things that motivated us to develop cortex.build so we could put the model on the developer's machine and completely isolate the code without complicated model deployments and maintenance.

BoorishBears 6 hours ago | parent | prev [-]

Mistral is founded by multiple Meta engineers, no?

Funded mostly by US VCs?

Hosted primarily on Azure?

Do you really have to go out of your way to start calling their competition "data leeches" for out-executing them?

troyvit 4 hours ago | parent | next [-]

It's wayyyy to early in the game to say who is out-executing whom.

I mean why do you think those guys left Meta? It reminds me of a time ten years ago I was sitting on a flight with a guy who works for the natural gas industry. I was (cough still am) a pretty naive environmentalist, so I asked him what he thought of solar, wind, etc. and why should we be investing in natural gas when there are all these other options. His response was simple. Natural gas can serve as a bridge from hydrocarbons to true green energy sources. Leverage that dense energy to springboard the other sources in the mix and you build a path forward to carbon free energy.

I see Mistral's use of US VCs the same way. Those VCs are hedging their bets and maybe hoping to make a few bucks. A few of them are probably involved because they're buddies with the former Meta guys "back in the day." If Mistral executes on their plan of being a transparent b2b option with solid data protections then they used those VCs the way they deserve to be used and the VCs make a few bucks. If Europe ever catches up to the US in terms of data centers, would Mistral move off of Azure? I'd bet $5 that they would.

sofixa 4 hours ago | parent | prev [-]

Mistral are mostly focusing on b2b, and for customers that want to self-host (banks and stuff). So their founders being from Meta, or where their cloud platform are hosted, are entirely irrelevant to the story.

BoorishBears 4 hours ago | parent [-]

The fact they would not exist without the leeches and built their business on the leeches is irrelevant.

Pan-nationalism is a hell of a drug: a company that does not know you exist puts out an objectively awful release, and people take frank discussion of it as a personal slight.

baq 2 hours ago | parent | next [-]

If you want to allocate capital efficiently planet-scale you have to ignore nations to the largest extent possible.

sofixa 4 hours ago | parent | prev [-]

> The fact they would not exist without the leeches and built their business on the leeches is irrelevant.

How so?

popinman322 8 hours ago | parent | prev | next [-]

They're comparing against open weights models that are roughly a month away from the frontier. Likely there's an implicit open-weights political stance here.

There are also plenty of reasons not to use proprietary US models for comparison: The major US models haven't been living up to their benchmarks; their releases rarely include training & architectural details; they're not terribly cost effective; they often fail to compare with non-US models; and the performance delta between model releases has plateaued.

A decent number of users in r/LocalLlama have reported that they've switched back from Opus 4.5 to Sonnet 4.5 because Opus' real world performance was worse. From my vantage point it seems like trust in OpenAI, Anthropic, and Google is waning and this lack of comparison is another symptom.

kalkin 7 hours ago | parent | next [-]

Scale AI wrote a paper a year ago comparing various models performance on benchmarks to performance on similar but held-out questions. Generally the closed source models performed better, and Mistral came out looking pretty badly: https://arxiv.org/pdf/2405.00332

extr 7 hours ago | parent | prev [-]

??? Closed US frontier models are vastly more effective than anything OSS right now, the reason they didn’t compare is because they’re a different weight class (and therefore product) and it’s a bit unfair.

We’re actually at a unique point right now where the gap is larger than it has been in some time. Consensus since the latest batch of releases is that we haven’t found the wall yet. 5.1 Max, Opus 4.5, and G3 are absolutely astounding models and unless you have unique requirements some way down the price/perf curve I would not even look at this release (which is fine!)

tarruda 8 hours ago | parent | prev | next [-]

Here's what I understood from the blog post:

- Mistral Large 3 is comparable with the previous Deepseek release.

- Ministral 3 LLMs are comparable with older open LLMs of similar sizes.

constantcrying 8 hours ago | parent [-]

And implicit in this is that it compares very poorly to SOTA models. Do you disagree with that? Do you think these Models are beating SOTA and they did not include the benchmarks, because they forgot?

saubeidl 7 hours ago | parent | next [-]

Those are SOTA for open models. It's a separate league from closed models entirely.

supermatt 7 hours ago | parent [-]

> It's a separate league from closed models entirely.

To be fair, the SOTA models aren't even a single LLM these days. They are doing all manner of tool use and specialised submodel calls behind the scenes - a far cry from in-model MoE.

tarruda 8 hours ago | parent | prev [-]

> Do you disagree with that?

I think that Qwen3 8B and 4B are SOTA for their size. The GPQA Diamond accuracy chart is weird: Both Qwen3 8B and 4B have higher scores, so they used this weid chart where "x" axis shows the number of output tokens. I missed the point of this.

meatmanek 4 hours ago | parent [-]

Generation time is more or less proportional to tokens * model size, so if you can get the same quality result with fewer tokens from the same size of model, then you save time and money.

crimsoneer 8 hours ago | parent | prev [-]

If someone is using these models, they probably can't or won't use the existing SOTA models, so not sure how useful those comparisons actually are. "Here is a benchmark that makes us look bad from a model you can't use on a task you won't be undertaking" isn't actually helpful (and definitely not in a press release).

constantcrying 8 hours ago | parent [-]

Completely agree, that there are legitimate reasons to prefer comparison to e.g. deepeek models. But that doesn't change my point, we both agree that the comparisons would be extremely unfavorable.

Lapel2742 8 hours ago | parent [-]

> that the comparisons would be extremely unfavorable.

Why should they compare apples to oranges? Ministral3 Large costs ~1/10th of Sonnet 4.5. They clearly target different users. If you want a coding assistant you probably wouldn't choose this model for various reasons. There is place for more than only the benchmark king.

constantcrying 7 hours ago | parent [-]

Come on. Do you just not read posts at all?

esafak 7 hours ago | parent [-]

Which lightweight models do these compare unfavorably with?