This combo has almost unbeatable accuracy and it rejects noises in the background really well. It can even reject people talking in the background.
The only better thing I've seen is Ursa model from Speechmatics. Not open weights unfortunately.
p.s. even the demo uses a remote server via websocket.
Depending on the permissions granted to apps on your mobile device, it can even be passively exfiltrated without you ever noticing - and that's ignoring the video clips people take and put online. Like your grandma uploading to Facebook a short moment from a Christmas meet or similar
There have already been successful scams - eg calls from "relatives" (AI) calling family members needing money urgently and convincing them to send the money...
Beyond that, I don't see how we stand to durably reduce military action by making languages mutually unintelligible.
https://simple.wikipedia.org/wiki/Russian_language#/media/Fi...