Remix.run Logo
jwr 5 days ago

Hmm. 80B. These days I am on the lookout for new models in the 32B range, since that is what fits and runs comfortably on my MacBook Pro (M4, 64GB).

I use ollama every day for spam filtering: gemma3:27b works great, but I use gpt-oss:20b on a daily basis because it's so much faster and comparable in performance.

bigyabai 4 days ago | parent | next [-]

The model is 80b parameters, but only 3b are activated during inference. I'm running the old 2507 Qwen3 30B model on my 8gb Nvidia card and get very usable performance.

coolspot 4 days ago | parent | next [-]

Yes, but you don’t know which 3B parameters you will need, so you have to keep all 80B in your VRAM, or wait until correct 3B are loaded from NVMe->RAM->VRAM. And of course it could be different 3B for each next token.

drozycki 4 days ago | parent [-]

The latest SSDs benchmark at 3GB/s and up. The marginal latency would be trivial compared to the inference time.

jwr 4 days ago | parent | prev [-]

I understand that, but whether it's usable depends on whether ollama can load parts of it into memory on my Mac, and how quickly.

bigyabai 3 days ago | parent [-]

I really do not suggest ollama. It is slow, missing tons of llama.cpp features and doesn't expose many settings to the user. Koboldcpp is a much better inference provider and even has an ollama-compatible API endpoint.

jabart 4 days ago | parent | prev | next [-]

Can you talk more about how you are using ollama for spam filtering?

jwr 4 days ago | parent [-]

I wrote a little thing that connects to my IMAP server (I run my own E-mail), goes through the unread E-mails in the inbox, processes them (process MIME multipart, extract HTML, describe images and links, etc) and feeds them to an LLM with a prompt. The LLM decides if the message is spam or not.

It's amazingly accurate.

The interesting thing is that after experimentation I found that it's best if the prompt doesn't describe what is spam. The LLMs are somewhat "intelligent", so the prompt now describes me — who I am, what I do, my interests, etc. It's much more effective and generalizes better to fight new kinds of spam.

And a nice side observation is that this kind of system requires no training (so I no longer collect samples of spam) and can't be gamed, because it describes me instead of describing specific kinds of spam.

I have to write it up in a blog post.

electroglyph 5 days ago | parent | prev [-]

it'll run great, it's an moe.

4 days ago | parent [-]
[deleted]