Remix.run Logo
phillc73 8 hours ago

I am a Mistral Le Chat Pro subscriber. I specifically chose to test their offerings because they are European. I don't have the necessary local hardware to run really big models, therefore need to choose a cloud provider if I want LLM action.

I find the antics of Anthropic, OpenAI, Google, Microsoft distasteful and avoid their products where I can.

After testing Le Chat and Devstral-2 for a while, I felt their offering was good enough to stump up some cash for it. I appreciate that many of their models are open weights and Apache 2.0 licensed. In general, I've been happy enough with the service and quality.

Maybe others are better, but I have little reason to change right now. If curiosity gets the better of me, I'll be looking at Qwen, Kimi, GLM, Deepseek, other open weights models, before Anthropic and OpenAI.

0xbadcafebee 4 hours ago | parent | next [-]

Mistral models are definitely good enough. Most people fall for what I call the SOTA Logical Fallacy: whenever there is a 'better model', they think they need to use it, when less-powerful models actually perform the same tasks just as well. (it's an inverse form of the Shifting Baseline Syndrome: every time a new model comes out, people shift their baseline of what is acceptable, despite the fact that a previous baseline was acceptable for the same task)

Devstral Small 2 was (and remains) a particularly strong small coding model, even beating larger open weights. Mistral's "problem" is marketing; other providers ship model updates constantly so they remain in the news and seem like they're "beating" the competition. And it works: people get emotionally attached to brands and models, deciding who's better in the court of popular opinion, and that drives their choices (& dollars).

badsectoracula 25 minutes ago | parent | next [-]

TBH sometimes i feel like i'm "emotionally attached" to Mistral's models because i always end up using them :-P. However that is because, as you wrote, their small models (i only use local stuff) are very strong. In fact i was trying Qwen3.6 27B recently and while it is nice that it can do tool calls during the reasoning process (i had it confirm its thoughts by writing Python code) it often ended up confusing itself (regardless of tool calls) during reasoning, ending up in loops where it questions itself over and over endlessly.

Devstral Small 2 however just works, for the most part. Qwen3.6 27B can probably handle more complex tasks (when i asked it as a test to write a function that checks for collision between two AABBs in C and gave it a tool to call Python code for confirmation, it actually wrote a Python script that writes C code with the tests, then calls GCC to compile the C code and runs the binary to run the tests, which is something Mistral's small models couldn't do) but i always felt i can just leave DS2 doing stuff in the background (or when i'm doing something else) and it'll produce something relatively useful whereas the little time i spent with Qwen3.6 27B it felt more "unstable" (and much slower, both because of literally slower inference and because of endless reams of text).

Recently i also started using Ministral 3B and 14B - these can do some reasoning too and for very simple stuff Ministral 3B is very fast (i actually didn't expect a 3B model to be anything more than novelty) and have some vision abilities (though they're quite mediocre at vision so i haven't found much use for this - passing something via GLM-OCR to extract all text and feed it to another model feels more practical).

Also as i wrote in another comment, every Mistral model i've tried never questioned me, which i certainly prefer

amunozo 24 minutes ago | parent | prev | next [-]

For certains tasks that are not hard but depend a clear specification, it's even better to haver less capable model because it forces you to do a better description of what you want, ending up with a better results. I will defend my PhD thesis soon and I will buy a yearly Mistral subscription at a student price to get it for cheap.

tmikaeld 2 hours ago | parent | prev [-]

My biggest issue with Devstral and even their biggest model is that they’re dangerous unless closely directed and reviewed and i mean CLOSELY. Unfortunately mistral models will believe and do anything.

See: https://petergpt.github.io/bullshit-benchmark/viewer/index.v...

See some of the test results, it’s horrifying

badsectoracula 39 minutes ago | parent [-]

FWIW personally i prefer this. When i tried Qwen3.6 and asked it a few questions, while it did respond, it was ADAMANT i should do something else when i really wanted an answer to the question i made. It felt like when you search something and a stackoverflow answer about what you search for comes up and the most upvoted answer is about using/doing something else - when you want a specific answer to that specific question, not something else.

Meanwhile Devstral Small 2 just answers the damn question.

I don't want to have to convince my computer to do what i want it to do, i want from it to do what i ask it to.

alxlaz 6 hours ago | parent | prev | next [-]

I use their API for several models, both for personal and professional use. I think their approach (smaller, specialised models that are well-adapted for specific tasks) is a very good fit for how I work. And even the more general-purpose ones, like the chat model, just... refreshingly good in a lot of ways. My "ruthless review" prompt, which I use for, well, ruthless, guided reviews of early technical drafts, has good technical results for early reviews and holy crap is it ruthless and does it know how to swear. By the time Claude or ChatGPT are done being honest about how right I am to push back and gently circling back, Mistral's large model has sent me back to the drawing board twice.

Being in the EU does smooth a lot of things in terms of compliance, payment processing and whatnot, but I also like that their data retention and privacy policies are pretty clearly spelled out. I need to know something, there's a good chance it's explained outright somewhere and I don't need to read between the EULA lines and wonder what it means.

I do hit limits in terms of capabilities sometimes, and I'm sure other providers' services offer better results for some things. But the businesses ran on top of those more capable models feel too much like a scam at this point and I'd rather not depend on them for anything I actually need.

dbl000 5 hours ago | parent [-]

That ruthless review prompt seems interesting, would you be willing to share it? I've been trying to have Claude act as a reviewer for me and it feels like it never will disagree.

alxlaz 3 hours ago | parent [-]

It's very hard to untangle it from the rest of its context (the prompt is built dynamically, from a lot of parts, some project-specific, some specific to my preferences, some built from interaction history), so I can't really share it. In any case, I don't think it's some specific prompt engineering sorcery I'm doing, it's not like I've spent any real time refining it or experimenting with various magical incantations. It's probably just some model features making it more amenable to the kind of instructions that are relevant in these cases (directness, questioning trade-offs, thoroughness etc.). My chatbot swears equally graphical in review prompts and news summarizing prompts so I'm pretty sure I'm not tickling the machine just right :)

altmanaltman 3 hours ago | parent [-]

Can you share some of its output for reference?

Havoc 7 hours ago | parent | prev [-]

There is also risk from a US regulatory side as recent drama around antrophic showed.

Don’t think it’s inconceivable that the clowns in power decide to limit api access out of the blue one day because someone whispered a conspiracy theory in someone’s ear. API blockade…

See also the constant flip flopping on what cards NVIDIA can export - no consistency in stance or coherent policy

tbrownaw 3 hours ago | parent | next [-]

You are conflating three very different things.

The thing with Anthropic and the military was about whether vendors can tell the military what operations it's permitted to do. It has no bearing on the commercial sector, and isn't actually about AI.

The thing with NVIDIA cards is a continuation of how we've restricted tech exports for quite a while. You can find old news articles about game consoles being export-restricted over nuclear proliferation concerns. This AI-related one was about whether or not custom AI models are relevant to national security, and whether restricting graphics card sales can have a meaningful impact on them.

Any issue with selling chat tokens internationally would be more akin to the recent tariff shenanigans.

trvz 5 hours ago | parent | prev | next [-]

Changing your LLM inference provider is the easiest switch in technology I can think of. It's quicker than taking off the case of your phone and putting on a new one.

Enough hardware and good models exist now that if you do get blocked from one place that viable alternatives do exist.

Havoc 4 hours ago | parent | next [-]

> Changing your LLM inference provider is the easiest switch in technology I can think of.

Thats true right up until you’re working with confidential info in a corporate context. Then it’s a multi month cross discipline cross jurisdiction project not an edit in a config file.

mring33621 4 hours ago | parent [-]

L O C A L M O D E L S

All data stays on computers that you control.

Same API. Localhost.

mring33621 4 hours ago | parent [-]

Try Mistral-Nemo-2407-12B-Thinking-Claude-Gemini-GPT5.2-Uncensored-HERETIC_Q4_k_m.gguf. This 7.5GB model runs well in llama.cpp on my 2021 Macbook Pro and is good at both coding and business document analysis tasks.

NekkoDroid 2 hours ago | parent [-]

> Try Mistral-Nemo-2407-12B-Thinking-Claude-Gemini-GPT5.2-Uncensored-HERETIC_Q4_k_m.gguf.

Thiss sounds like such a shitpost I initially thought you were joking... but this seems to be a real model???

cpburns2009 24 minutes ago | parent | next [-]

There's a method to the madness:

- Mistral-Nemo: the actual model developed by Mistral and Nvidia.

- 2407: likely the release date of the base model, July of 2024.

- 12B: the model has 12 billion parameters.

- Thinking: the model operates in thinking mode (generates output plan and injests it before producing actual output).

- Claude-Gemini-GPT5.2: I think this means the model was finetuned with session data from Claude, Gemini, and GTP5.2 to replicate their behavior.

- Uncensored-HERITIC: the model was uncensored using the automated Heretic method.

- Q4_k_m: the model is quantized (lossy compression) to ~5 bpw from orignal 16 bpw.

NekkoDroid 15 minutes ago | parent [-]

Yea, I know what the parts individually mean. I just meant as a whole it just seemed so obsurd.

mring33621 2 hours ago | parent | prev [-]

It is! I like to try the variations from possibly 'interesting' people.

Some of them are good. Others randomly break into gibberish and Chinese poetry(?).

5 hours ago | parent | prev [-]
[deleted]
notTheLastMan 6 hours ago | parent | prev [-]

[dead]