If someone is using these models, they probably can't or won't use the existing SOTA models, so not sure how useful those comparisons actually are. "Here is a benchmark that makes us look bad from a model you can't use on a task you won't be undertaking" isn't actually helpful (and definitely not in a press release).

▲

constantcrying 7 hours ago | parent [-]

Completely agree, that there are legitimate reasons to prefer comparison to e.g. deepeek models. But that doesn't change my point, we both agree that the comparisons would be extremely unfavorable.

▲

Lapel2742 7 hours ago | parent [-]

> that the comparisons would be extremely unfavorable.

Why should they compare apples to oranges? Ministral3 Large costs ~1/10th of Sonnet 4.5. They clearly target different users. If you want a coding assistant you probably wouldn't choose this model for various reasons. There is place for more than only the benchmark king.

▲

constantcrying 6 hours ago | parent [-]

Come on. Do you just not read posts at all?

	▲	esafak 6 hours ago \| parent [-]
		Which lightweight models do these compare unfavorably with?