> Nvidia has been using its newfound liquid funds to train its own family of models

Nvidia has always had its own family of models, it's nothing new and not something you should read too much into IMHO. They use those as template other people can leverage and they are of course optimized for Nvidia hardware.

Nvidia has been training models in the Megatron family as well as many others since at least 2019 which was used as blueprint by many players. [1]

[1] https://arxiv.org/abs/1909.08053

▲

breput 14 hours ago | parent | next [-]

Nemotron-3-Nano-30B-A3B[0][1] is a very impressive local model. It is good with tool calling and works great with llama.cpp/Visual Studio Code/Roo Code for local development.

It doesn't get a ton of attention on /r/LocalLLaMA but it is worth trying out, even if you have a relatively modest machine.

[0] https://huggingface.co/nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B...

[1] https://huggingface.co/unsloth/Nemotron-3-Nano-30B-A3B-GGUF

▲

bhadass 13 hours ago | parent | next [-]

Some of NVIDIA's models also tend to have interesting architectures. For example, usage of the MAMBA architecture instead of purely transformers: https://developer.nvidia.com/blog/inside-nvidia-nemotron-3-t...

	▲	nextos 12 hours ago \| parent [-]
		Deep SSMs, including the entire S4 to Mamba saga, are a very interesting alternative to transformers. In some of my genomics use cases, Mamba has been easier to train and scale over large context windows, compared to transformers.

▲

jychang 13 hours ago | parent | prev | next [-]

It was good for like, one month. Qwen3 30b dominated for half a year before that, and GLM-4.7 Flash 30b took over the crown soon after Nemotron 3 Nano came out. There was basically no time period for it to shine.

▲

breput 13 hours ago | parent | next [-]

It is still good, even if not the new hotness. But I understand your point.

It isn't as though GLM-4.7 Flash is significantly better, and honestly, I have had poor experiences with it (and yes, always the latest llama.cpp and the updated GGUFs).

▲

ThrowawayTestr 13 hours ago | parent | prev | next [-]

Genuinely exciting to be around for this. Reminds me of the time when computers were said to be obsolete by the time you drove them home.

▲

binary132 12 hours ago | parent | prev [-]

I recently tried GLM-4.7 Flash 30b and didn’t have a good experience with it at all.

	▲	breput 10 hours ago \| parent [-]
		It feels like GLM has either a bit of a fan club or maybe some paid supporters...

▲

deskamess 2 hours ago | parent | prev | next [-]

Do they have a good multilingual embedding model? Ideally, with a decent context size like 16/32K. I think Qwen has one at 32K. Even the Gemma contexts are pretty small (8K).

▲

superjan 7 hours ago | parent | prev | next [-]

Oh those ghastly model names. https://www.smbc-comics.com/comic/version

▲

binary132 12 hours ago | parent | prev [-]

I find the Q8 runs a bit more than twice as fast as gpt-120b since I don’t have to offload as many MoE layers, but is just about as capable if not better.

▲

nl 7 hours ago | parent | prev | next [-]

Nemo is different to Megatron.

Megatron was a research project.

NVidia has professional services selling companies on using Nemo for user facing applications.

▲

8 hours ago | parent | prev | next [-]

[deleted]

▲

retinaros 6 hours ago | parent | prev [-]

its a finetune..