Top
Best
New

Posted by lukeinator42 11/19/2025

Meta Segment Anything Model 3(ai.meta.com)
692 points | 134 commentspage 2
____tom____ 11/20/2025|
Ok, I tried convert body to 3d, which is seems to do well, but it just gives me the image, I see no way to export or use this image. I can rotate it, but that's it.

Is there some functionality I'm missing? I've tried Safari and Firefox.

FeiyouG 11/20/2025||
If you open inspect element you can download the blob there. It is a .ply file and you can view it in any splat viewer.
nmfisher 11/20/2025||
I didn't look too close but it wouldn't surprise me if this was intentional. Many of these Meta/Facebook projects don't have open licenses so they never graduate from web demos. Their voice cloning model was the same.
featureofone 11/20/2025||
The SAM models are great. I used the latest version when building VideoVanish ( https://github.com/calledit/VideoVanish ) a video-editing GUI for removing or making objects vanish from videos.

That used SAM 2, and in my experience SAM 2 was more or less perfect—I didn’t really see the need for a SAM 3. Maybe it could have been better at segmenting without input.

But the new text prompt input seams nice; much easier to automate stuff using text input.

jdprgm 11/20/2025|
Promising looking tool. It would be useful to add a performance section to the readme for some ballpark of what to expect even if it is just a reference point of one gpu.

I've been considering building something similar but focused on static stuff like watermarks so just single masks. From that diffueraser page it seems performance is brutally slow with less than 1 fps on 720p.

For watermarks you can use ffmpeg blur which will of course be super fast and looks good on certain kinds of content that are mostly uniform like a sky but terrible and very obvious for most backgrounds. I've gotten really good results with videos shot with static cameras generating a single inpainted frame and then just using that as the "cover" cropped and blurred over the watermark or any object really. Even better results with completely stabilizing the video and balancing the color if it is changing slightly over time. This of course only works if nothing moving intersects with the removed target or if the camera is moving then you need every frame inpainted.

Thus far all full video inpainting like this has been so slow as to not be practically useful for example to casually remove watermarks on videos measured in tens of minutes instead of seconds where i would really want processing to be close to realtime. I've wondered what knobs can be turned if any to sacrifice quality in order to boost performance. My main ideas are to try to automate detecting and applying that single frame technique to as much of the video as possible and then separately process all the other chunks with diffusion scaling to some really small size like 240p and then use ai based upscaling on those chunks which seems to be fairly fast these days compared to diffusion.

featureofone 11/20/2025||
Good point — I’ll add that to the README.

Masking is fast — more or less real-time, maybe even a bit faster.

However, infill is not real-time. It runs at about 0.8 FPS on a 3090 GTX at 860p (which is the default resolution of the underlying networks).

There are much faster models out there, but none that match the visual quality and can run on a consumer GPU as of now. The use case for VideoVanish is more geared towards professional or hobby video editing — e.g., you filmed a scene for a video or movie and don’t want to spend two days doing manual in painting.

VideoVanish does have an option to run the infill at a lower resolution. Where it fills only the infilled areas using the low-resolution output — that way you can trade visual fidelity for speed. Depending on what’s behind the patches, this can be a very viable approach.

xfeeefeee 11/19/2025||
I can't wait until it is easy to rotoscope / greenscreen / mask this stuff out accessibly for videos. I had tried Runway ML but it was... lacking, and the webui for fixing parts of it had similar issues.

I'm curious how this works for hair and transparent/translucent things. Probably not the best, but does not seem to be mentioned anywhere? Presumably it's just a straight line or vector rather than alpha etc?

rocauc 11/19/2025||
I tried it on transparent glass mugs, and it does pretty well. At least better than other available models: https://i.imgur.com/OBfx9JY.png

Curious if you find interesting results - https://playground.roboflow.com

nodja 11/19/2025||
I'm pretty sure davinci resolve does this already, you can even track it, idk if it's available in the free version.
Ey7NFZ3P0nzAe 11/20/2025||
> *Core contributor (Alphabetical, Equal Contribution), Intern, †Project leads, §Equal Contribution

I like seeing this

raindear 11/20/2025||
There has been a slow progress in computer vision in the last ~5 years. We are still not close to human performance. This is in contrast to language understanding which has been solved - LLMs understand text on a human level (even if they have other limitations). But vision isn't solved. Foundation models struggle to segment some objects, they don't generalize to domains such as scientific images, etc. I wonder what's missing with models. We have enough data in videos. Is it compute? Is the task not informative enough? Do we need agency in 3D?
tarsinge 11/20/2025||
I’m not an expert in the field but intuitively from my own experience I’d say what’s missing is a world model. By trying to be more conscious about my own vision I’ve started to notice how common it is that I fail to recognize a shape and then use additional knowledge, context and extrapolations to deduce what it can be.

A few examples I encountered recently: If I take a picture of my living room many random object would be impossible to identify by a stranger but easy by the household members. Or when driving, say at night I see a big dark shape coming from the side of the road? If I’m a local I’ll know there are horses in that field and it is fenced, or I might have read a warning sign before that’ll make me able to deduce what I’m seeing a few minutes later.

People are usually not conscious about this but you can try to block the additional informations to only see and process only what’s really coming from your eyes, and realize how soon it gets insufficient.

patates 11/20/2025||
> If I take a picture of my living room many random object would be impossible to identify by a stranger but easy by the household members.

Uneducated question so may sound silly: A sufficiently complex vision model must have seen a million living rooms and random objects there to make some good guesses, no?

parineum 11/20/2025||
> LLMs understand text on a human level (even if they have other limitations).

Limitations like understanding...

visioninmyblood 11/20/2025||
The problem is the data. LLM data is self supervised. Vision data is very sparsly annotated in the real world. Going a step further robotics data is is much sparser. So getting these models to improve on this long tail distribution will take time.
pacifi30 11/20/2025||
Grateful for Meta to release models and give the GPU access for free, it has been great for experimenting without the thinking overhead of paying too much for inference. Thank you Zuck.
torginus 11/20/2025||
These models have been super cool and it'd be nice if they made it into some editing program. Is there anything consumer focused that has this tech?
Redster 11/20/2025||
https://news.ycombinator.com/item?id=44736202

"Krita plugin Smart Segments lets you easily select objects using Meta’s Segment Anything Model (SAM v2). Just run the tool, and it automatically finds everything on the current layer. You can click or shift-click to choose one or more segments, and it converts them into a selection."

torginus 11/20/2025||
This is a good start, however this looks more like a hobbyist experiment by some guy instead of a polished way of integrating these new techniques into the software.

Also LOL @ the pictures in the readmee on Github

nuclearsugar 11/20/2025|||
Here are two plugins for After Effects - https://aescripts.com/mask-prompter/ https://aescripts.com/depth-scanner-lite/
embedding-shape 11/20/2025|||
I think DaVinci Resolve probably have the best, professional-grade usage of ML models today, but they're not "AI Features Galore" about it when it's there. They might mention it as "Paint Out Unwanted Objects" or similar. From the latest release (https://www.blackmagicdesign.com/products/davinciresolve/wha...), I think I could spot 3-4 features at least that are using ML underneath, but aren't highlighted as "AI" at all. Still very useful stuff.
127 11/20/2025||
ComfyUI addon for Krita is pretty close I think.
visioninmyblood 11/20/2025||
Claude, gemini and ChatGPT does image segmentation in surprising ways - we did a small evaluation [1] of different frontier models for image segementation and understanding, and Claude is by far the most surprising in results.

https://news.ycombinator.com/item?id=45996392

sciencesama 11/19/2025|
Does the license allow for commercial purposes?
rocauc 11/19/2025||
Yes. It's a custom license with an Acceptable Use Policy preventing military use and export restrictions. The custom license permits commercial use.
nebula8804 11/20/2025||
If this is whats in the consumer space I'd imagine the government has something much more advanced. Its probably a foregone conclusion that they are recording the entire country (maybe the world) and storing everyone's movements or are getting close to it.
visioninmyblood 11/19/2025|||
I just check and it seems to commercial permissiable.Companies like vlm.run and roboflow are using for commercial use as show by thier comments below. So i guess it can be used for commercial purposes.
rocauc 11/19/2025||
Yes. But also note that redistribution of SAM 3 requires using the same SAM 3 license downstream. So libraries that attempt to, e.g., relicense the model as AGPL are non-compliant.
colesantiago 11/19/2025||
Yes, the license allows you to grift for your “AI startup”
More comments...