Remix.run Logo
trvz 5 hours ago

Changing your LLM inference provider is the easiest switch in technology I can think of. It's quicker than taking off the case of your phone and putting on a new one.

Enough hardware and good models exist now that if you do get blocked from one place that viable alternatives do exist.

Havoc 4 hours ago | parent | next [-]

> Changing your LLM inference provider is the easiest switch in technology I can think of.

Thats true right up until you’re working with confidential info in a corporate context. Then it’s a multi month cross discipline cross jurisdiction project not an edit in a config file.

mring33621 4 hours ago | parent [-]

L O C A L M O D E L S

All data stays on computers that you control.

Same API. Localhost.

mring33621 4 hours ago | parent [-]

Try Mistral-Nemo-2407-12B-Thinking-Claude-Gemini-GPT5.2-Uncensored-HERETIC_Q4_k_m.gguf. This 7.5GB model runs well in llama.cpp on my 2021 Macbook Pro and is good at both coding and business document analysis tasks.

NekkoDroid 2 hours ago | parent [-]

> Try Mistral-Nemo-2407-12B-Thinking-Claude-Gemini-GPT5.2-Uncensored-HERETIC_Q4_k_m.gguf.

Thiss sounds like such a shitpost I initially thought you were joking... but this seems to be a real model???

cpburns2009 22 minutes ago | parent | next [-]

There's a method to the madness:

- Mistral-Nemo: the actual model developed by Mistral and Nvidia.

- 2407: likely the release date of the base model, July of 2024.

- 12B: the model has 12 billion parameters.

- Thinking: the model operates in thinking mode (generates output plan and injests it before producing actual output).

- Claude-Gemini-GPT5.2: I think this means the model was finetuned with session data from Claude, Gemini, and GTP5.2 to replicate their behavior.

- Uncensored-HERITIC: the model was uncensored using the automated Heretic method.

- Q4_k_m: the model is quantized (lossy compression) to ~5 bpw from orignal 16 bpw.

NekkoDroid 13 minutes ago | parent [-]

Yea, I know what the parts individually mean. I just meant as a whole it just seemed so obsurd.

mring33621 2 hours ago | parent | prev [-]

It is! I like to try the variations from possibly 'interesting' people.

Some of them are good. Others randomly break into gibberish and Chinese poetry(?).

5 hours ago | parent | prev [-]
[deleted]