| ▲ | schopra909 12 hours ago | |
I think you nailed it. For us it’s classifiers that we train for very specific domains. You’d think it’d be better to just finetune a smaller non-LLM model, but empirically we find the LLM finetunes (like 7B) perform better. | ||
| ▲ | moffkalast 10 hours ago | parent [-] | |
I think it's no surprise that any model that has a more general understanding of text performs better than some tiny ad-hoc classifier that blindly learns a couple of patterns and has no clue what it's looking at. It's going to fail in much weirder ways that make no sense, like old cnn-based vision models. | ||