I was playing with the new IBM Granite models. They are quick/small and they do seem accurate. You can even try them online in the browser because they are small enough to be loaded via the filesystem: https://huggingface.co/spaces/ibm-granite/Granite-4.0-Nano-W...

Not only are they a lot more recent than gemma, they seem really good at tool calling, so probably good for coding tools. I haven’t personally tried it myself for that.

The actual page is here: https://huggingface.co/ibm-granite/granite-4.0-h-1b

▲

firefax 3 days ago | parent | next [-]

Interesting. Is there a way to load this into Ollama? Doing things in browser is a cool flex, but my interest is specifically in privacy respecting LLMs -- my goal is to run the most powerful one I can on my personal machine, with the end goal being those little queries I used to send to "the cloud" can be done offline, privately.

▲

fultonn 3 days ago | parent [-]

> Is there a way to load this into Ollama?

Yes, the granite 4 models are on ollama:

https://ollama.com/library/granite4

> but my interest is specifically in privacy respecting LLMs -- my goal is to run the most powerful one I can on my personal machine

The HF Spaces demo for granite 4 nano does run on your local machine, using Transformers.js and ONNX. After downloading the model weights you can disconnect from the internet and things should still work. It's all happening in your browser, locally.

Of course ollama is preferable for your own dev environment. But ONNX and transformers.js is amazingly useful for edge deployment and easily sharing things with non-technical users. When I want to bundle up a little demo for something I typically just do that instead of the old way I did things (bundle it all up on a server and eat the inference cost).

▲

firefax 3 days ago | parent [-]

Thanks for this pointer and explanation, I appreciate it.

Also my "dev enviornment" is vi -- I come from infosec (so basically a glorified sysadmin) so I'm mostly making little bash and python scripts, so I'm learning a lot of new things about software engineering as I explore this space :-)

Edit: Hey which of the models on that page were you referring to? I'm grabbing one now that's apparently double digit GB? Or were you saying they're not CPU/ram intensive but still a bit big?

	▲	fultonn 11 hours ago \| parent [-]
		> Edit: Hey which of the models on that page were you referring to? I was referring to the smaller ones -- `granite4:micro`, `granite4-latest`, `granite4:350m`. > I'm grabbing one now that's apparently double digit GB? You are probably downloading one of these two ids: `granite4:small-h` or `granite4:32b-a9b-h`. The "small" model _is_ small in relative terms, but is also the largest of the currently released granite models! At 32B parameters (19GB download) it's runnable locally but not in the same "run on your laptop with acceptable performance" category of the nano/micro models. > Also my "dev enviornment" is vi -- I come from infosec (so basically a glorified sysadmin) so I'm mostly making little bash and python scripts, so I'm learning a lot of new things about software engineering as I explore this space :-) Shameless plug: if you're writing Python scripts to automate things using small locally hosted models, consider trying out https://github.com/generative-computing/mellea Mellea tries to nudge toward good software engineering practices -- breaking down big tasks into smaller parts, checking outputs after nondeterministic steps, thinking in terms of data structures and invariants rather than flow charts, etc. We built it with "actual fully automated robust workflows" in mind. You can use it with big models or small models, but it really shines when used with small models.

▲

brendoelfrendo 3 days ago | parent | prev [-]

Not the person you replied to, but thanks for this recommendation. These look neat! I'm definitely going to give them a try.