Was evaluating YOLO26 within the last month for its on-device (iPhone 16 Pro) segmentation capabilities. Its decent, but its biggest limitation is that its only trained on 80 COCO classes (meaning pre-labeled images). If whatever is in your images isn't in the 80 classes, its invisible to YOLO26. Conversely I have SAM2 running on-device and its my current workhorse. The biggest benefit with SAM2 for me is that it does fine-grained segmentation masks but isn't trained on labeled images. This was a specific requirement for the app I'm building. SAM2 isn't anywhere as speedy as the native Vision framework apis, but it is more capable across a vastly wider array of potential image targets.

▲

larodi 5 hours ago | parent [-]

I would prefer GroundingDINo which is a sort of SAM and Dino combo which does open vocabulary.

	▲	geuis 2 hours ago \| parent [-]
		Doesn't work for my use-case. GroundingDINO is a text to bounding box model. SAM2 supports coordinate based masks (user taps or clicks somewhere in an image), which is what my research app needs.