Posted by georgemandis 6/25/2025
Clearly the next thing we need to test is removing all the vowels from words, or something like that :)
One thing that I thought was fairly clear in my write-up but feels a little lost in the comments: I didn't just try this with whisper. I tried it with their newer gpt-4o-transcription model, which seems considerably faster. There's no way to run that one locally.
There is also prob a way to send a smaller sampler of audio at diff speeds and compare them to get a speed optimization with no quality loss unique for each clip.
Nice. Any blog post, twitter comment or anything pointing to that?
We were developing an AI that processes someone’s Instagram profile (reels, comments, etc.) and provided insights and realized we could 2x the audio to cut time and costs.
I could find a screen shot of our internal texts I suppose, but we didn’t publish anything on it.
Or you can just copy the transcript that YouTube provides below the video.