| ▲ | breput 14 hours ago | ||||||||||||||||||||||||||||
Nemotron-3-Nano-30B-A3B[0][1] is a very impressive local model. It is good with tool calling and works great with llama.cpp/Visual Studio Code/Roo Code for local development. It doesn't get a ton of attention on /r/LocalLLaMA but it is worth trying out, even if you have a relatively modest machine. [0] https://huggingface.co/nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B... [1] https://huggingface.co/unsloth/Nemotron-3-Nano-30B-A3B-GGUF | |||||||||||||||||||||||||||||
| ▲ | bhadass 13 hours ago | parent | next [-] | ||||||||||||||||||||||||||||
Some of NVIDIA's models also tend to have interesting architectures. For example, usage of the MAMBA architecture instead of purely transformers: https://developer.nvidia.com/blog/inside-nvidia-nemotron-3-t... | |||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||
| ▲ | jychang 13 hours ago | parent | prev | next [-] | ||||||||||||||||||||||||||||
It was good for like, one month. Qwen3 30b dominated for half a year before that, and GLM-4.7 Flash 30b took over the crown soon after Nemotron 3 Nano came out. There was basically no time period for it to shine. | |||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||
| ▲ | deskamess 2 hours ago | parent | prev | next [-] | ||||||||||||||||||||||||||||
Do they have a good multilingual embedding model? Ideally, with a decent context size like 16/32K. I think Qwen has one at 32K. Even the Gemma contexts are pretty small (8K). | |||||||||||||||||||||||||||||
| ▲ | superjan 7 hours ago | parent | prev | next [-] | ||||||||||||||||||||||||||||
Oh those ghastly model names. https://www.smbc-comics.com/comic/version | |||||||||||||||||||||||||||||
| ▲ | binary132 12 hours ago | parent | prev [-] | ||||||||||||||||||||||||||||
I find the Q8 runs a bit more than twice as fast as gpt-120b since I don’t have to offload as many MoE layers, but is just about as capable if not better. | |||||||||||||||||||||||||||||