Top
Best
New

Posted by meetpateltech 10 hours ago

Voxtral Transcribe 2(mistral.ai)
678 points | 167 commentspage 3
sbinnee 3 hours ago|
3 hours for a single request sounds nice to me. Although the graph suggests that it’s not going to perform as good as openai model I have been using, it is open source and surely I will give it a try.
sgt 4 hours ago||
What's the best way to train this further on a specific dialect or accent or even terminology?
ccleve 5 hours ago||
This looks great, but it's not clear to me how to use it for a practical task. I need to transcribe about 10 years worth of monthly meetings. These are government hearings with a variety of speakers. All the videos are on YouTube. What's the most practical and cost-effective way to get reasonably accurate transcripts?
IanCal 5 hours ago||
If you use something like youtube-dlp you can download the audio from the meetings, and you could try things out in mistrals ai studio.

You could use their api (they have this snippet):

```curl -X POST "https://api.mistral.ai/v1/audio/transcriptions" \ -H "Authorization: Bearer $MISTRAL_API_KEY" \ -F model="voxtral-mini-latest" \ -F file=@"your-file.m4a" \ -F diarize=true \ -F timestamp_granularities="segment"```

In the api it took 18s to do a 20m audio file I had lying around where someone is reviewing a product.

There will, I'm sure, be ways of running this locally up and available soon (if they aren't in huggingface right now) but the API is $0.003/min. If it's something like 120 meetings (10 years of monthly ones) then it's roughly $20 if the meetings are 1hr each. Depending on whether they're 1 or 10 hours (or if they're weekly or monthly but 10 parallel sessions or something) then this might be a price you're willing to pay if you get the results back in an afternoon.

edit - their realtime model can be run with vllm, the batch model is not open

isoprophlex 4 hours ago|||
- get an API key for this service

- make sure you have a list of all these YouTube meeting URLs somewhere

- ask your preferred coding assistant to write you up a script that downloads the audio for these videos with yt-dlp & calls Mixtrals' API

- ????

- profit

jimmy76615 5 hours ago||
If they are on Youtube, try Gemini 3 Flash first. Use AI studio, it lets you insert YouTube videos into context.
Archelaos 9 hours ago||
As a rule of thumb for software that I use regularly, it is very useful to consider the costs over a 10-year period in order to compare it with software that I purchase for lifetime to install at home. So that means 1,798.80 $ for the Pro version.

What estimates do others use?

siddbudd 7 hours ago||
Wired advertises this as "Ultra-Fast Translation"[^1]. A bit weird coming from a tech magazine. I hope it's just a "typo".

[^1]: https://www.wired.com/story/mistral-voxtral-real-time-ai-tra...

bigyabai 7 hours ago|
It might be capable of translation; OpenAI Whisper was a transcription model that could do it.
yewenjie 7 hours ago||
One week ago I was on the hunt for an open source model that can do diatization and I had to literally give up because I could not find any easy to use setup.
ashenke 6 hours ago||
I don't know if that will change, but right now only the Voxtral Mini Transcribe V2 supports diarization and it's not open-weight. The Voxtral Realtime model doesn't support diarization, but is open-weight.
vojto11 6 hours ago||
WhisperX ?
asah 2 hours ago||
Smells Like Teen Spirit survives another challenge!

Voxtral Transcribe 2:

Light up our guns, bring your friends, it's fun to lose and to pretend. She's all the more selfish, sure to know how the dirty world. I wasn't what I'd be best before this gift I think best A little girl is always been Always will until again Well, the lights out, it's a stage And we are now entertainers. I'm just stupid and contagious. And we are now entertainers. I'm a lot of, I'm a final. I'm a skater, I'm a freak. Yeah! Hey! Yeah. And I forget just why I taste it Yeah, I guess it makes me smile I found it hard, it's hard to find the well Whatever, never mind Well, the lights out, it's a stage. You and I are now entertainers. I'm just stupid and contagious. You and I are now entertainers. I'm a lot of, I'm a minor. I'm a killer. I'm a beater. I'm a nerd. I'm a nerd. I'm a nerd. I'm a nerd. I'm a nerd. I'm a nerd. I'm a nerd. I'm a nerd. I'm a nerd. And I forget just why I taste it Yeah, I guess it makes me smile I found it hard, it's hard to find the well Whatever, never mind I know, I know, I know, I know, I know Well, the lights out, it's a stage. You and I are now entertainers. I'm just stupid and contagious. You and I are now entertainers. I'm a lot of, I'm a minor. I'm a killer. I'm a beater. I'm a nerd. I'm a nerd. I'm a nerd. I'm a nerd. I'm a nerd. I'm a nerd. I'm a nerd. I'm a nerd. I'm a nerd.

Google/Musixmatch:

Load up on guns, bring your friends It's fun to lose and to pretend She's over-bored, and self-assured Oh no, I know a dirty word Hello, hello, hello, how low? Hello, hello, hello, how low? Hello, hello, hello, how low? Hello, hello, hello With the lights out, it's less dangerous Here we are now, entertain us I feel stupid and contagious Here we are now, entertain us A mulatto, an albino A mosquito, my libido, yeah Hey, yey I'm worse at what I do best And for this gift, I feel blessed Our little group has always been And always will until the end Hello, hello, hello, how low? Hello, hello, hello, how low? Hello, hello, hello, how low? Hello, hello, hello With the lights out, it's less dangerous Here we are now, entertain us I feel stupid and contagious Here we are now, entertain us A mulatto, an albino A mosquito, my libido, yeah Hey, yey And I forget just why I taste Oh yeah, I guess it makes me smile I found it hard, it's hard to find Oh well, whatever, never mind Hello, hello, hello, how low? Hello, hello, hello, how low? Hello, hello, hello, how low? Hello, hello, hello With the lights out, it's less dangerous Here we are now, entertain us I feel stupid and contagious Here we are now, entertain us A mulatto, an albino A mosquito, my libido A denial, a denial A denial, a denial A denial, a denial A denial, a denial A denial

asah 2 hours ago|
(when it was released, adults/press/etc. found SLTS famously incomprehensible and then they realized that the kids didn't understand the lyrics either, and Weird Al nailed it with his classic, Smells Like Nirvana: https://www.google.com/search?q=Smells+Like+Nirvana )
jszymborski 7 hours ago||
I'm guessing I won't be able to finetune this until they come out with a HF tranformers model, right?
blobinabottle 6 hours ago|
Impressive results, tested on crappy audio files (in french and english)...
More comments...