Remix.run Logo
Aurornis 4 hours ago

This is an open weights model based on other open weights models.

The dispute is that they released it with claims about having done some post training that improved the outputs. It was discovered that the model was not post trained like they claimed.

The HF page now says it’s a merge of models, which wasn’t there before. They’re trying to claim they accidentally uploaded the wrong model to HF and that they’ll upload the real one soon.

Basically, they thought they could splice two open weights models together and claim their team had accomplished some amazing post training, but they weren’t smart enough to realize that other researchers would discover that there wasn’t any post training.

moritzwarhier 4 hours ago | parent | next [-]

Thanks for the factual clarification. This is so important when everyone already has their trigger finger on politics. Not meaning that politics are irrelevant here, see sister comment by jobim.

But it's impossible to form a nuanced opinion when political association has a higher priority than the facts; which, again, don't look flattering for the implementers.

iknowstuff 3 hours ago | parent | prev [-]

How do they just splice two models together?

Aurornis 3 hours ago | parent | next [-]

The Nex N2 model they merged is based on Qwen 3.5, so you can swap pieces of one into the other. They found a combination of the two that did well on some benchmarks and shipped it.

In the early days of Llama there were a lot of experiments like this. There were even some interesting combinations of models where they stacked layers of different models together or even added more layers with interesting results.

But announcing that you spliced two models together isn't very impressive in 2026, so they announced that they had done their own post training and outdid the big labs. They thought nobody would look close enough to notice.

ninja3925 3 hours ago | parent | prev [-]

Out of curiosity, how was it discovered? You would have to look for it to find this linear combination.

jdiff 2 hours ago | parent | next [-]

Without the system prompt, asking its name results in it responding with the name of the model they're ripping from. That would certainly draw your eyes to the right places.

valleyer 2 hours ago | parent [-]

Why is this? Do labs reinforce the model name during training? I was under the impression that this sort of "self-knowledge" always came from the system prompt, but I guess not...

jdiff an hour ago | parent [-]

Yes. In this case, during fine tuning. Other blurbs are also baked in during fine tuning that are perfectly reproducible from the Nex model. The details inside the linked issue are quite accessible.

Aurornis 3 hours ago | parent | prev [-]

Check the linked GitHub issue. They explain their process.

Scroll past the first issue to find it. It’s further down.