Remix.run Logo
msp26 6 hours ago

The new large model uses DeepseekV2 architecture. 0 mention on the page lol.

It's a good thing that open source models use the best arch available. K2 does the same but at least mentions "Kimi K2 was designed to further scale up Moonlight, which employs an architecture similar to DeepSeek-V3".

---

vllm/model_executor/models/mistral_large_3.py

```

from vllm.model_executor.models.deepseek_v2 import DeepseekV3ForCausalLM

class MistralLarge3ForCausalLM(DeepseekV3ForCausalLM):

```

"Science has always thrived on openness and shared discovery." btw

Okay I'll stop being snarky now and try the 14B model at home. Vision is good additional functionality on Large.

Jackson__ 2 hours ago | parent | next [-]

So they spent all of their R&D to copy deepseek, leaving none for the singular novel added feature: vision.

To quote the hf page:

>Behind vision-first models in multimodal tasks: Mistral Large 3 can lag behind models optimized for vision tasks and use cases.

Ey7NFZ3P0nzAe an hour ago | parent [-]

Well, behind "models" not "langual models".

Of course models purely made for image stuff will completely wipe it out. The vision language models are useful for their generalist capabilities

halJordan an hour ago | parent | prev | next [-]

I don't think it's fair to demand everything be open and then get mad when they open-ness is used. It's an obsessive and harmful double standard.

make3 3 hours ago | parent | prev [-]

Architecture difference wrt vanilla transformers and between modern transformers are a tiny part of what makes a model nowadays