Our new SAM audio model transforms audio editing

Posted by ushakov 12/16/2025

Our new SAM audio model transforms audio editing(about.fb.com)

168 points | 66 commentspage 2

ac2u 12/23/2025|

I wonder if the segmentation would work with a video of a ventriloquist and a dummy?

websiteapi 12/23/2025||

I wonder if it works for speaker diarization out of the box. I've found that open source speaker diarization that doesn't require a lot of tweaking is basically non-existent.

hamza_q_ 12/23/2025|

Yeah I was frustrated by slow and hard to use OSS diarization too; recently released a library to address that, check it out: https://github.com/narcotic-sh/senko

Also https://zanshin.sh, if you'd like speaker diarization when watching YouTube videos

noman-land 12/23/2025|||

Hey, thanks for this. Been trying it out and it's very fast but seems to hear more speakers than are in the audio. I didn't see a way to tweak speaker similarity settings or merge speakers in some way. Any advice?

hamza_q_ 12/25/2025||

Thanks for checking it out!

Yeah unfortunately, since the diarization is acoustic features based, it really does require high recorded voice fidelity/quality to get the best results. However, I just added another knob to the Diarizer class called mer_cos, which controls the speaker merging threshold. The default is 0.875, so perhaps try lowering to 0.8. That should help.

I'll also get around to adding a oracle/min/max speakers feature at some point, for cases where you know the exact number of speakers ahead of time, or wanna set upper/lower bounds. Gotten busy with another project, so haven't done it yet. PR's welcome though! haha

noman-land 12/26/2025||

Thanks, `mer_cos` definitely gets me closer. I appreciate that. Yeah, I was thinking providing a param for the expected number of speakers would be nice. I'll check out the codebase and see if that's something I can contribute :).

hamza_q_ 12/26/2025||

Yeah would love contributions! Here's a brief overview of how I think it can be done:

Senko has two clustering types, (1) spectral for audio < 20 mins in length, and (2) UMAP+HDBSCAN for >= 20 mins. In the clustering code, spectral actually already supports orcale/min/max speakers, but UMAP+HDBSCAN doesn't. However, someone forked Senko and added min/max speakers to that here (for oracle, I guess min = max): https://github.com/DedZago/senko/commit/c33812ae185a5cd420f2...

So I think all that's required is basically just testing this thoroughly to make sure it doesn't introduce any regressions in clustering quality. And then just wiring the oracle/min/max parameters to the Diarizer class, or diarize() func.

websiteapi 12/23/2025|||

looks interesting. will check it out.

7734128 12/23/2025||

Finally a way to perhaps remove laugh tracks in the near future.

sefrost 12/23/2025|

There are examples on YouTube of laughter tracks being removed and there are lots of awkward pauses, so I think you'd need to edit the video to cut the pauses out entirely.

- https://www.youtube.com/watch?v=23M3eKn1FN0

- https://www.youtube.com/watch?v=DgKgXehYnnw

embedding-shape 12/23/2025||

Cutting the pauses will change the beats and rhythm of the scene, so you probably need to edit some of the voice lines and actual scenes too then. In the end, if you're not interested in the original performance and work, you might as well read the script instead and imagine it however you want, read it at the pace you want and so on.

vintermann 12/23/2025||

And have a video model render an entirely new version for you, I guess.

m3kw9 12/23/2025||

Can I create a continuous “who farted” detector? Would be great at parties

rmnclmnt 12/23/2025||

Bighead is back! « Fart Alert »!

IncreasePosts 12/23/2025||

Each person's unique fartprint is yet another way big tech will be tracking us

samat 12/23/2025|||

And ads based on a fart! I guess you could throw in some spectrography for content aware ads too!! ‘Hmm, I sense you like onions, you would love French soup in the restaurant downstairs today!’

BoorishBears 12/23/2025|||

They're already analyzing poop, what's a mic to go with your toilet camera?

https://www.kohlerhealth.com/dekoda/

theflyestpilot 12/23/2025||

sample anything model?

IndySun 12/24/2025|

A lot of comments here exhibit the Gell-Mann amnesia effect writ large.

AlexeyBelov 12/25/2025|

Your comment is just a meta-comment and that's just as bad. I suggest gently correcting people instead of just pointing out very non-specifically that someone is wrong.

IndySun 12/25/2025||

I have. I did. I do. But like so many cocktail sticks launched towards the mammoth, eventually one lobs a final ineffectual remark.

But also agreed (with you, yes), for the vast majority of moments, ignore and don't add more noise. But sometimes... human after all.