Remix.run Logo
schopra909 12 hours ago

I think you nailed it.

For us it’s classifiers that we train for very specific domains.

You’d think it’d be better to just finetune a smaller non-LLM model, but empirically we find the LLM finetunes (like 7B) perform better.

moffkalast 10 hours ago | parent [-]

I think it's no surprise that any model that has a more general understanding of text performs better than some tiny ad-hoc classifier that blindly learns a couple of patterns and has no clue what it's looking at. It's going to fail in much weirder ways that make no sense, like old cnn-based vision models.