Hmm. 80B. These days I am on the lookout for new models in the 32B range, since that is what fits and runs comfortably on my MacBook Pro (M4, 64GB).

I use ollama every day for spam filtering: gemma3:27b works great, but I use gpt-oss:20b on a daily basis because it's so much faster and comparable in performance.

▲

bigyabai 4 days ago | parent | next [-]

The model is 80b parameters, but only 3b are activated during inference. I'm running the old 2507 Qwen3 30B model on my 8gb Nvidia card and get very usable performance.

▲

coolspot 4 days ago | parent | next [-]

Yes, but you don’t know which 3B parameters you will need, so you have to keep all 80B in your VRAM, or wait until correct 3B are loaded from NVMe->RAM->VRAM. And of course it could be different 3B for each next token.

	▲	drozycki 4 days ago \| parent [-]
		The latest SSDs benchmark at 3GB/s and up. The marginal latency would be trivial compared to the inference time.

▲

jwr 4 days ago | parent | prev [-]

I understand that, but whether it's usable depends on whether ollama can load parts of it into memory on my Mac, and how quickly.

	▲	bigyabai 3 days ago \| parent [-]
		I really do not suggest ollama. It is slow, missing tons of llama.cpp features and doesn't expose many settings to the user. Koboldcpp is a much better inference provider and even has an ollama-compatible API endpoint.

▲

jabart 4 days ago | parent | prev | next [-]

Can you talk more about how you are using ollama for spam filtering?

	▲	jwr 4 days ago \| parent [-]
		I wrote a little thing that connects to my IMAP server (I run my own E-mail), goes through the unread E-mails in the inbox, processes them (process MIME multipart, extract HTML, describe images and links, etc) and feeds them to an LLM with a prompt. The LLM decides if the message is spam or not. It's amazingly accurate. The interesting thing is that after experimentation I found that it's best if the prompt doesn't describe what is spam. The LLMs are somewhat "intelligent", so the prompt now describes me — who I am, what I do, my interests, etc. It's much more effective and generalizes better to fight new kinds of spam. And a nice side observation is that this kind of system requires no training (so I no longer collect samples of spam) and can't be gamed, because it describes me instead of describing specific kinds of spam. I have to write it up in a blog post.

▲

electroglyph 5 days ago | parent | prev [-]

it'll run great, it's an moe.

	▲	4 days ago \| parent [-]
		[deleted]