Show HN: Ghost Pepper – Local hold-to-talk speech-to-text for macOS

Posted by MattHart88 20 hours ago

Show HN: Ghost Pepper – Local hold-to-talk speech-to-text for macOS(github.com)

I built this because I wanted to see how far I could get with a voice-to-text app that used 100% local models so no data left my computer. I've been using a ton for coding and emails. Experimenting with using it as a voice interface for my other agents too. 100% open-source MIT license, would love feedback, PRs, and ideas on where to take it.

424 points | 189 commentspage 5

gegtik 19 hours ago|

how does this compare to macos built in siri TTS, in quality and in privacy?

realityfactchex 19 hours ago|

Exactly my question. I double-tap the control button and macOS does native, local TTS dictation pretty well. (Similar to Keyboard > Enable Dictation setting on iOS.)

The macOS built-in TTS (dictation) seems better than all the 3rd party, local apps I tried in the past that people raved about. I have tried several.

Is this better somehow?

If the 3rd party apps did streaming with typing in place and corrections within a reasonable window when they understand things better given more context, that would be cool. Theoretically, a custom model or UX could be "better" than what comes free built into macOS (more accurate or customizable).

But when I contacted the developer of my favorite one they said that would be pretty hard to implement due to having to go back and make corrections in the active field, etc.

I assume streaming STT in these utilities for Mac will get better at some point, but I haven't seen it yet (been waiting). It seems these tools generally are not streaming, e.g. they want you to finish speaking first before showing you anything. Which doesn't work for me when I'm dictating. I want to see what I've been saying lately, to jog my memory about what I've just said and help guide the next thing I'm about to say. I certainly don't want to split my attention by manually toggling the control (whether PTT or not) periodically to indicate "ok, you can render what I just said now".

I guess "hold-to-talk" tools are for delivering discrete, fully formed messages, not for longer, running dictation.

AFAICT, TFA is focused on hold-to-talk as the differentiator, over double-tap to begin speaking and double-tap to end speaking?

realityfactchex 13 hours ago||

s/TTS/STT/

purplehat_ 19 hours ago||

Hi Matt, there's lots of speech-to-text programs out there with varying levels of quality. 100% local is admirable but it's always a tradeoff and users have to decide for themselves what's worth it.

Would you consider making available a video showing someone using the app?

semiquaver 18 hours ago|

Slop

vaulpann 15 hours ago||

very cool - huge open source drop!

thatxliner 16 hours ago||

why isn't the cleanup done on the transcription (as opposed to screen record)

dakila5 17 hours ago||

MacWhisper is also a good one

douglaswlance 18 hours ago||

does it input the text as soon as it hears it? or does it wait until the end?

sorkhabi 12 hours ago||

Well done

zhichuanxun 1 hour ago|

[dead]