| ▲ | Meta Segment Anything Model 3(ai.meta.com) | ||||||||||||||||||||||||||||
| 127 points by lukeinator42 5 hours ago | 28 comments | |||||||||||||||||||||||||||||
| ▲ | daemonologist 42 minutes ago | parent | next [-] | ||||||||||||||||||||||||||||
First impressions are that this model is extremely good - the "zero-shot" text prompted detection is a huge step ahead of what we've seen before (both compared to older zero-shot detection models and to recent general purpose VLMs like Gemini and Qwen). With human supervision I think it's even at the point of being a useful teacher model. I put together a YOLO tune for climbing hold detection a while back (trained on 10k labels) and this is 90% as good out of the box - just misses some foot chips and low contrast wood holds, and can't handle as many instances. It would've saved me a huge amount of manual annotation though. | |||||||||||||||||||||||||||||
| ▲ | gs17 an hour ago | parent | prev | next [-] | ||||||||||||||||||||||||||||
The 3D mesh generator is really cool too: https://ai.meta.com/sam3d/ It's not perfect, but it seems to handle occlusion very well (e.g. a person in a chair can be separated into a person mesh and a chair mesh) and it's very fast. | |||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||
| ▲ | clueless an hour ago | parent | prev | next [-] | ||||||||||||||||||||||||||||
With a avg latency of 4 seconds, this still couldn't be used in real-time video, correct? | |||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||
| ▲ | hodgehog11 an hour ago | parent | prev | next [-] | ||||||||||||||||||||||||||||
This is an incredible model. But once again, we find an announcement for a new AI model with highly misleading graphs. That SA-Co Gold graph is particularly bad. Looks like I have another bad graph example for my introductory stats course... | |||||||||||||||||||||||||||||
| ▲ | bangaladore 19 minutes ago | parent | prev | next [-] | ||||||||||||||||||||||||||||
Probably still can't get past a Google Captcha when on a VPN. Do I click the square with the shoe of the person who's riding the motorcycle? | |||||||||||||||||||||||||||||
| ▲ | rocauc an hour ago | parent | prev | next [-] | ||||||||||||||||||||||||||||
A brief history. SAM 1 - Visual prompt to create pixel-perfect masks in an image. No video. No class names. No open vocabulary. SAM 2 - Visual prompting for tracking on images and video. No open vocab. SAM 3 - Open vocab concept segmentation on images and video. Roboflow has been long on zero / few shot concept segmentation. We've opened up a research preview exploring a SAM 3 native direction for creating your own model: https://rapid.roboflow.com/ | |||||||||||||||||||||||||||||
| ▲ | yeldarb 2 hours ago | parent | prev | next [-] | ||||||||||||||||||||||||||||
We (Roboflow) have had early access to this model for the past few weeks. It's really, really good. This feels like a seminal moment for computer vision. I think there's a real possibility this launch goes down in history as "the GPT Moment" for vision. The two areas I think this model is going to be transformative in the immediate term are for rapid prototyping and distillation. Two years ago we released autodistill[1], an open source framework that uses large foundation models to create training data for training small realtime models. I'm convinced the idea was right, but too early; there wasn't a big model good enough to be worth distilling from back then. SAM3 is finally that model (and will be available in Autodistill today). We are also taking a big bet on SAM3 and have built it into Roboflow as an integral part of the entire build and deploy pipeline[2], including a brand new product called Rapid[3], which reimagines the computer vision pipeline in a SAM3 world. It feels really magical to go from an unlabeled video to a fine-tuned realtime segmentation model with minimal human intervention in just a few minutes (and we rushed the release of our new SOTA realtime segmentation model[4] last week because it's the perfect lightweight complement to the large & powerful SAM3). We also have a playground[5] up where you can play with the model and compare it to other VLMs. [1] https://github.com/autodistill/autodistill [2] https://blog.roboflow.com/sam3/ [3] https://rapid.roboflow.com | |||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||
| ▲ | xfeeefeee an hour ago | parent | prev | next [-] | ||||||||||||||||||||||||||||
I can't wait until it is easy to rotoscope / greenscreen / mask this stuff out accessibly for videos. I had tried Runway ML but it was... lacking, and the webui for fixing parts of it had similar issues. I'm curious how this works for hair and transparent/translucent things. Probably not the best, but does not seem to be mentioned anywhere? Presumably it's just a straight line or vector rather than alpha etc? | |||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||
| ▲ | fzysingularity 2 hours ago | parent | prev | next [-] | ||||||||||||||||||||||||||||
SAM3 is cool - you can already do this more interactively on chat.vlm.run [1], and do much more. It's built on our new Orion [2] model; we've been able to integrate with SAM and several other computer-vision models in a truly composable manner. Video segmentation and tracking is also coming soon! | |||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||
| ▲ | HowardStark an hour ago | parent | prev | next [-] | ||||||||||||||||||||||||||||
Curious if anyone has done anything meaningful with SAM2 and streaming. SAM3 has built-in streaming support which is very exciting. I’ve seen versions where people use an in-memory FS to write frames of stream with SAM2. Maybe that is good enough? | |||||||||||||||||||||||||||||
| ▲ | dangoodmanUT an hour ago | parent | prev | next [-] | ||||||||||||||||||||||||||||
This model is incredibly impressive. Text is definitely the right modality, and now the ability to intertwine it with an LLM creates insane unlocks - my mind is already storming with ideas of projects that are now not only possible, but trivial. | |||||||||||||||||||||||||||||
| ▲ | sciencesama an hour ago | parent | prev | next [-] | ||||||||||||||||||||||||||||
Does the license allow for commercial purposes? | |||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||
| ▲ | exe34 11 minutes ago | parent | prev | next [-] | ||||||||||||||||||||||||||||
can anyone confirm if this fits in a 3090? the files look about 3.5GB, but I can't work out what the memory needs will be overall. | |||||||||||||||||||||||||||||
| ▲ | foota 5 minutes ago | parent | prev [-] | ||||||||||||||||||||||||||||
Obligatory xkcd: https://xkcd.com/1425/ | |||||||||||||||||||||||||||||