Show HN: Ghost Pepper – Local hold-to-talk speech-to-text for macOS

Posted by MattHart88 15 hours ago

Show HN: Ghost Pepper – Local hold-to-talk speech-to-text for macOS(github.com)

I built this because I wanted to see how far I could get with a voice-to-text app that used 100% local models so no data left my computer. I've been using a ton for coding and emails. Experimenting with using it as a voice interface for my other agents too. 100% open-source MIT license, would love feedback, PRs, and ideas on where to take it.

383 points | 174 commentspage 2

kushalpandya 9 hours ago|

Speecg-to-text is basically AI version of Todo app that we used to build every week when new frontend framework would release.

jwr 4 hours ago||

I currently use MacWhisper and it is quite good, but it's great to see an alternative, especially as I've been looking to use more recent models!

I hope there will be a way to plug in other models: I currently work mostly with Whisper Large. Parakeet is slightly worse for non-English languages. But there are better recent developments.

ipsum2 15 hours ago||

Parakeet is significantly more accurate and faster than Whisper if it supports your language.

yeutterg 15 hours ago||

Are you running Parakeet with VoiceInk[0]?

[0]: https://github.com/beingpax/VoiceInk

ipsum2 13 hours ago|||

I'm using https://github.com/senstella/parakeet-mlx library.

zackify 14 hours ago|||

i am, working great for a long time now

rahimnathwani 15 hours ago|||

Right, and if you're on MacOS you can use it for free with Hex: https://github.com/kitlangton/Hex

lloyd-christmas 13 hours ago||

Or write your own custom one with the library that backs it: https://github.com/FluidInference/FluidAudio

I did that so that I could record my own inputs and finetune parakeet to make it accurate enough to skip post-processing.

rahimnathwani 11 hours ago||

There's a fork of FluidAudio that supports the recent Cohere model: https://github.com/altic-dev/FluidAudio/tree/B/cohere-coreml...

It's used by this dictation app: https://github.com/altic-dev/FluidVoice/

totetsu 8 hours ago|||

Parakeet supports japanese now, but I cant find a version ported to apple silicone yet.

treetalker 15 hours ago|||

I have been using Parakeet with MacWhisper's hold-to-talk on a MacBook Neo and it's been awesome.

obrajesse 14 hours ago||

And indeed, Ghost Pepper supports parakeet v3

ianmurrays 4 hours ago||

I had Claude make this hammerspoon config + daemon that does pretty much the same, in case anyone is interested.

https://github.com/ianmurrays/hammerspoon/blob/main/stt.lua

miki123211 5 hours ago||

What do you actually use for STT, particularly if you prize performance over privacy and are comfortable using your own API keys?

I was on WhisperFlow for a while until the trial ran out, and I'm really tempted to subscribe. I don't think I can go back to a local solution after that, the performance difference is insane.

k9294 3 hours ago|

Try ottex.ai - it has an OpenRouter like gateway with most STT models on the market (Gemini, OpenAI, Groq, Deepgram, Mistral, AssemblyAI, Soniox), so you can try them all and choose what works best for you.

My favorites are Gemini 3 Flash and Mistral Voxtral Transcribe 2. Gemini when I need special formatting and clean-up, and Voxtral when I need fast input (mostly when working with AI).

snickell 8 hours ago||

Can somebody help me understand how they use these, I feel like I'm missing something or I'm bad at something?

I only spent 10 minutes with Handy, and a similar amount of time with SuperWhisper, so pretty ignorant. I tried it both with composing this comment, and in a programming session with Codex. I was slightly frustrated to not be hands free, instead of typing, my hands were having to press and release a talk button (option-space in handy, right-command in superwhisper), but then I couldn't submit, so I still had to click enter with Codex.

Additionally, for composing this message, I'm using the keyboard a ton because there's no way I can find to correct text I've typed. Do other people get really reliable and don't need backspace anymore? Or.... what text do you not care enough to edit? Notes maybe?

My point of comparison is using Dragon like 15 years ago. TBH, while the recognition is better (much better) on handy/superwhisper, everything else felt MUCH worse. With dragon, you are (were?) totally hands free, you see text as you say it, and you could edit text really easily vocally when it made a mistake (which it did a fair bit, admittedly). And you could press enter and pretty functionally navigate w/o a keyboard too.

Its weird to see all these apps, and they all have the same limitations?

bambushu 4 hours ago||

nice to see this running fully local. what model size are you shipping as default, and what's the cold-start time on Apple Silicon? I've been using Whisper locally for meeting transcription and the biggest friction point is always endpoint detection - knowing when you've stopped talking vs pausing to think. curious how you handle that with hold-to-talk.

fiatpandas 11 hours ago||

The clean up prompt needs adjusting. If your transcription is first person and in the voice of talking to an AI assistant, it really wants to “answer” you, completing ignoring its instructions. I fiddled with the prompt but couldn’t figure out how to make it not want to act like an AI assistant.

__mharrison__ 14 hours ago||

Cool, I've been doing a lot of "coding" (and other typing tasks) recently by tapping a button on my Stream Deck. It starts recording me until I tap it again. At which point, it transcribes the recording and plops it into the paste buffer.

The button next to it pastes when I press it. If I press it again, it hits the enter command.

You can get a lot done with two buttons.

coldfoundry 6 hours ago|

This is exactly what I am building right now, Stream Deck with two buttons too (push to talk and enter)! It's a sweet little pet project, and has been a blast to build so far. Excited to finally add it to my workflow once its working well.

ghm2199 8 hours ago|

I've been using handy since a month and its awesome. I mainly use it with coding agents or when I don't want to type into text boxes. How is this different?

Part of the reason handy is awesome is because it uses some of the same rust infra for integrating with the model, so that actually makes it possible to use the code as a library in android or iOS. I have an android app that runs on a local model on the phone too using this.

More comments...