Meta Segment Anything Model 3

Posted by lukeinator42 11 hours ago

Meta Segment Anything Model 3(ai.meta.com)

280 points | 57 commentspage 2

yeldarb 7 hours ago|

We (Roboflow) have had early access to this model for the past few weeks. It's really, really good. This feels like a seminal moment for computer vision. I think there's a real possibility this launch goes down in history as "the GPT Moment" for vision. The two areas I think this model is going to be transformative in the immediate term are for rapid prototyping and distillation.

Two years ago we released autodistill[1], an open source framework that uses large foundation models to create training data for training small realtime models. I'm convinced the idea was right, but too early; there wasn't a big model good enough to be worth distilling from back then. SAM3 is finally that model (and will be available in Autodistill today).

We are also taking a big bet on SAM3 and have built it into Roboflow as an integral part of the entire build and deploy pipeline[2], including a brand new product called Rapid[3], which reimagines the computer vision pipeline in a SAM3 world. It feels really magical to go from an unlabeled video to a fine-tuned realtime segmentation model with minimal human intervention in just a few minutes (and we rushed the release of our new SOTA realtime segmentation model[4] last week because it's the perfect lightweight complement to the large & powerful SAM3).

We also have a playground[5] up where you can play with the model and compare it to other VLMs.

[1] https://github.com/autodistill/autodistill

[2] https://blog.roboflow.com/sam3/

[3] https://rapid.roboflow.com

[4] https://github.com/roboflow/rf-detr

[5] https://playground.roboflow.com

sorenjan 5 hours ago||

SAM3 is probably a great model to distill from when training smaller segmentation models, but isn't their DINOv2 a better example of a large foundation model to distill from for various computer vision tasks? I've seen it used for as starting point for models doing segmentation and depth estimation. Maybe there's a v3 coming soon?

https://dinov2.metademolab.com/

yeldarb 14 minutes ago|||

We used DINOv2 as the backbone of our RF-DETR model, which is SOTA on realtime object detection and segmentation: https://github.com/roboflow/rf-detr

It makes a great target to distill SAM3 to.

nsingh2 5 hours ago|||

DINOv3 was released earlier this year: https://ai.meta.com/dinov3/

I'm not sure if the work they did with DINOv3 went into SAM3. I don't see any mention of it in the paper, though I just skimmed it.

mchusma 4 hours ago|||

Thanks for the linkes! Can we run rf-detr in the browser for background removal? This wasn't clear to me from the docs

yeldarb 13 minutes ago||

We have a JS SDK that supports RF-DETR: https://docs.roboflow.com/deploy/sdks/web-browser

dangoodmanUT 7 hours ago||

I was trying to figure out from their examples, but how are you breaking up the different "things" that you can detect in the image? Are you just running it with each prompt individually?

rocauc 7 hours ago||

The model supports batch inference, so all prompts are sent to the model, and we parse the results.

xfeeefeee 7 hours ago||

I can't wait until it is easy to rotoscope / greenscreen / mask this stuff out accessibly for videos. I had tried Runway ML but it was... lacking, and the webui for fixing parts of it had similar issues.

I'm curious how this works for hair and transparent/translucent things. Probably not the best, but does not seem to be mentioned anywhere? Presumably it's just a straight line or vector rather than alpha etc?

rocauc 7 hours ago||

I tried it on transparent glass mugs, and it does pretty well. At least better than other available models: https://i.imgur.com/OBfx9JY.png

Curious if you find interesting results - https://playground.roboflow.com

nodja 7 hours ago||

I'm pretty sure davinci resolve does this already, you can even track it, idk if it's available in the free version.

torginus 3 hours ago||

These models have been super cool and it'd be nice if they made it into some editing program. Is there anything consumer focused that has this tech?

Redster 3 hours ago||

https://news.ycombinator.com/item?id=44736202

"Krita plugin Smart Segments lets you easily select objects using Meta’s Segment Anything Model (SAM v2). Just run the tool, and it automatically finds everything on the current layer. You can click or shift-click to choose one or more segments, and it converts them into a selection."

embedding-shape 3 hours ago|||

I think DaVinci Resolve probably have the best, professional-grade usage of ML models today, but they're not "AI Features Galore" about it when it's there. They might mention it as "Paint Out Unwanted Objects" or similar. From the latest release (https://www.blackmagicdesign.com/products/davinciresolve/wha...), I think I could spot 3-4 features at least that are using ML underneath, but aren't highlighted as "AI" at all. Still very useful stuff.

127 3 hours ago||

ComfyUI addon for Krita is pretty close I think.

rocauc 7 hours ago||

A brief history. SAM 1 - Visual prompt to create pixel-perfect masks in an image. No video. No class names. No open vocabulary. SAM 2 - Visual prompting for tracking on images and video. No open vocab. SAM 3 - Open vocab concept segmentation on images and video.

Roboflow has been long on zero / few shot concept segmentation. We've opened up a research preview exploring a SAM 3 native direction for creating your own model: https://rapid.roboflow.com/

HowardStark 7 hours ago||

Curious if anyone has done anything meaningful with SAM2 and streaming. SAM3 has built-in streaming support which is very exciting.

I’ve seen versions where people use an in-memory FS to write frames of stream with SAM2. Maybe that is good enough?

tom-in-july 5 hours ago|

The native support for streaming in SAM3 is awesome. Especially since it should also remove some of the memory accumulation for long sequences.

I used SAM2 for tracking tumors in real-time MRI images. With the default SAM2 and loading images from the da, we could only process videos with 10^2 - 10^3 frames before running out of memory.

By developing/adapting a custom version (1) based on a modified implementation with real (almost) stateless streaming (2) we were able to increase that to 10^5 frames. While this was enough for our purposes, I spend way too much time debugging/investigating tiny differences between SAM2 versions. So it’s great that the canonical version now supports streaming as well.

(Side note: I also know of people using SAM2 for real-time ultrasound imaging.)

1 https://github.com/LMUK-RADONC-PHYS-RES/mrgrt-target-localiz...

2 https://github.com/Gy920/segment-anything-2-real-time

dangoodmanUT 7 hours ago||

This model is incredibly impressive. Text is definitely the right modality, and now the ability to intertwine it with an LLM creates insane unlocks - my mind is already storming with ideas of projects that are now not only possible, but trivial.

nowittyusername 5 hours ago||

This thing rocks. i can imagine soo many uses for it. I really like the 3d pose estimation especially

bangaladore 6 hours ago|

Probably still can't get past a Google Captcha when on a VPN. Do I click the square with the shoe of the person who's riding the motorcycle?

conception 6 hours ago|

There are services you can get that will bypass those with a browser extension for you.

More comments...