| ▲ | daemonologist 2 hours ago | |
First impressions are that this model is extremely good - the "zero-shot" text prompted detection is a huge step ahead of what we've seen before (both compared to older zero-shot detection models and to recent general purpose VLMs like Gemini and Qwen). With human supervision I think it's even at the point of being a useful teacher model. I put together a YOLO tune for climbing hold detection a while back (trained on 10k labels) and this is 90% as good out of the box - just misses some foot chips and low contrast wood holds, and can't handle as many instances. It would've saved me a huge amount of manual annotation though. | ||
| ▲ | rocauc an hour ago | parent [-] | |
As someone that works on a platform users have used for labeling 1B images, I'm bullish SAM 3 can automate at least 90% of the work. Data prep is flipped to models being human-assisted instead of humans being model-assisted (see "autolabel" https://blog.roboflow.com/sam3/). I'm optimistic majority of users can now start deploying a model to then curate data instead of the inverse. | ||