Top
Best
New

Posted by georgemandis 6/25/2025

OpenAI charges by the minute, so speed up your audio(george.mand.is)
740 points | 228 commentspage 4
impossiblefork 6/25/2025|
Make the minutes longer, you mean.
another_twist 6/27/2025||
You'd need a WER comparison to check if it really is no drop in quality. With this trick, there might be trouble if the audio is noisy, and it may. ot always be obvious whether or not to speed up.
tmaly 6/25/2025||
The whisper model weights are free. You could save even more by just using them locally.
pzo 6/25/2025|
but this is still great trick if you want to reduce latency or inference speed even with local models e.g. in realtime chatbot
cprayingmantis 6/26/2025||
I noticed something similar with images as inputs to Claude, you can scale down the images and still get good outputs. There is an accuracy drop off at a certain point but the token savings are worth doing a little tuning there.
georgemandis 6/26/2025|
Definitely in the same spirit!

Clearly the next thing we need to test is removing all the vowels from words, or something like that :)

ryanar 6/26/2025||
In my experience, transcription software has no problem with transcribing sped up audio, or audio that is inaudible to humans or extremely loud (as long as not clipped), I wonder if LLM transcription works the same.
donkey_brains 6/25/2025||
Hmm…doesn’t this technique effectively make the minute longer, not shorter? Because you can pack more speech into a minute of recording? Seems like making a minute shorter would be counterproductive.
StochasticLi 6/25/2025|
No. You're paying for a minute of audio, which will be more packed with speech, not for how long it's being computed.
PeterStuer 6/26/2025||
I wonder how much time and battery transcoding/uploading/downloading over coffeeshop wifi would realy save vs just running it locally through optimized Whisper.
georgemandis 6/26/2025|
I had this same thought and won't pretend my fear was rational, haha.

One thing that I thought was fairly clear in my write-up but feels a little lost in the comments: I didn't just try this with whisper. I tried it with their newer gpt-4o-transcription model, which seems considerably faster. There's no way to run that one locally.

xg15 6/25/2025||
That's really cool! Also, isn't this effectively the same as supplying audio with a sampling rate of 8kHz instead of the 16kHz that the model is supposed to work with?
ada1981 6/25/2025||
We discovered this last month.

There is also prob a way to send a smaller sampler of audio at diff speeds and compare them to get a speed optimization with no quality loss unique for each clip.

moralestapia 6/25/2025||
>We discovered this last month.

Nice. Any blog post, twitter comment or anything pointing to that?

ada1981 7/3/2025||
We didn’t think to publish it; it actually seemed so obvious I assumed it was a widely known thing.

We were developing an AI that processes someone’s Instagram profile (reels, comments, etc.) and provided insights and realized we could 2x the audio to cut time and costs.

appleaday1 6/25/2025||
source?
ada1981 7/3/2025||
Oh I wasn’t trying to take credit for it, we just discovered we could do this last month and assumed it was widely known and implemented it.

I could find a screen shot of our internal texts I suppose, but we didn’t publish anything on it.

pottertheotter 6/26/2025|
You can just ask Gemini to summarize it for you. It's free. I do it all the time with YouTube videos.

Or you can just copy the transcript that YouTube provides below the video.

More comments...