Posted by rzzzzru 17 hours ago
We already do this for ingesting podcasts and cutting their clips with text being highlighted as people speak. AssemblyAI also supports speaker diarization.
For videos recorded using our own livestreaming studio, we can bypass all this by using Web STT and TTS APIs resulting in perfect timing and diarization without the need for server side models.
need more pioneers to try it out and spread the word!