Really looking forward to testing and benchmarking this on my spam filtering benchmark. gemma-3-27b was a really strong model, surpassed later by gpt-oss:20b (which was also much faster). qwen models always had more variance.

▲

mhitza 2 days ago | parent | next [-]

If you wouldn't mind chatting about your usage, my email is in my profile, and I'd love to share experiences with other HNers using self-hosted models.

▲

jeffbee 2 days ago | parent | prev [-]

Does spam filtering really need a better model? My impression is that the whole game is based on having the best and freshest user-contributed labels.

▲

drob518 2 days ago | parent | next [-]

He said it’s a benchmark.

▲

hrmtst93837 2 days ago | parent | prev [-]

Better models help on the day the spam mutates, before you have fresh labels for the new scam and before spammers can infer from a few test runs which phrasing still slips through. If you need labels for each pivot you're letting them experiment on your users.

▲

jeffbee 2 days ago | parent [-]

In my experience the contents of the message are all but totally irrelevant to the classification, and it is the behavior of the mailing peer that gives all the relevant features.

	▲	mh- a day ago \| parent [-]
		Based on how much blatant gmail->gmail spam I receive, the gmail team agrees with this strategy.