Remix.run Logo
VoidWhisperer 2 days ago

https://github.com/nex-agi/Nex-N2/issues/4

Seems that they didn't make/train a new novel model, they did a mix of two existing models and then gave it an instruction to say it was 'Rio, trained by Rio AI Labs'

w4yai 2 days ago | parent | next [-]

> The model is built via a merge of https://huggingface.co/nex-agi/Nex-N2-Pro and https://huggingface.co/Qwen/Qwen3.5-397B-A17B, proceeded by On-Policy Distillation from a stronger model. We detected an incorrect upload in the previous version, where the base merged version was upload instead of the final distilled model. We are sorry for the confusion and apologize profusely.

https://huggingface.co/prefeitura-rio/Rio-3.5-Open-397B/comm...

daquisu 2 days ago | parent | next [-]

It was a recent edit though. Yesterday snapshot: https://web.archive.org/web/20260613072958/https://huggingfa...

giancarlostoro 18 hours ago | parent [-]

How does that contradict that they uploaded the wrong model?

danieldrehmer a day ago | parent | prev | next [-]

can you offer a 4-bit quantized version and name it Zé Pequeno, pretty please?

scotty79 20 hours ago | parent | prev [-]

I'd love to see people figuring out how to build models from several smaller ones. We could then train small specialized models and deploy setups more optimized for any given task. Modular LLMs should be a thing.

giancarlostoro 18 hours ago | parent [-]

This is something I've been trying to figure out for a bit, some models are really good at instructions, but their context window is too small, I do wonder if having a cluster of smaller models would be feasible. Been building a custom coding harness so once its nice and polished I might experiment with this more.

urbnspacecowboy a day ago | parent | prev [-]

See discussion: https://news.ycombinator.com/item?id=48528371