▲ | AJRF 6 days ago | |||||||
I've got a very real world use case I use DistilBERT for - learning how to label wordpress articles. It is one of those things where it's kind of valuable (tagging) but not enough to spend loads on compute for it. The great thing is I have enough data (100k+) to fine-tune and run a meaningful classification report over. The data is very diverse, and while the labels aren't totally evenly distributed, I can deal with the imbalance with a few tricks. Can't wait to swap it out for this and see the changes in the scores. Will report back | ||||||||
▲ | minimaxir 6 days ago | parent | next [-] | |||||||
ModernBERT may be a better base model if finetuning a model for a specific use case: https://huggingface.co/blog/modernbert | ||||||||
| ||||||||
▲ | ramoz 6 days ago | parent | prev | next [-] | |||||||
Please provide updates when you have them. | ||||||||
▲ | weird-eye-issue 6 days ago | parent | prev [-] | |||||||
It's going to perform badly unless you have very few tags and it's easy to classify them | ||||||||
|