Handy – Free open source speech-to-text app

Posted by tin7in 1/15/2026

Handy – Free open source speech-to-text app(github.com)

247 points | 110 comments

d4rkp4ttern 1/15/2026|

I’ve tried several, including this one, and I’ve settled on VoiceInk (local, one-time payment), and with Parakeet V3 it’s stunningly fast (near-instant) and accurate enough to talk to LLMs/code-agents, in the sense that the slight drop in accuracy relative to Whisper Turbo3 is immaterial since they can “read between the lines” anyway.

My regular cycle is to talk informally to the CLI agent and ask it to “say back to me what you understood”, and it almost always produces a nice clean and clear version. This simultaneously works as confirmation of its understanding and also as a sort of spec which likely helps keep the agent on track.

UPDATE - just tried handy with Parakeet v3, and it works really well too, so I'll use this instead of VoiceInk for a few days. I just also discovered that turning on the "debug" UI with Cmd-shift-D shows additional options like post processing and appending trailing space.

thethimble 1/15/2026|

I wish one of these models was fine tuned for programming.

I want to be able to say things like "cd ~/projects" or "git push --force".

netghost 1/15/2026|||

I'll bet you could take a relatively tiny model and get it to translate the transcribed "git force push" or "git push dash dash force" into "git push --force".

Likewise "cd home slash projects" into "cd ~/projects".

Maybe with some fine tuning, maybe without.

vismit2000 1/16/2026||||

You can try VSCode Speech to Text extension that works decently well in Github Copilot chat as part of Microsoft accessibility suite.

swah 1/18/2026|||

Or just enjoy your last days of cd'ing, this shall pass soon!

blutoot 1/15/2026||

I have dystonia which often stiffens my arms in a way that makes it impossible for me to type on a keyboard. TTS apps like SuperWhisper have proven to be very helpful for me in such situations. I am hoping to get a similar experience out of "Handy" (very apt maming from my perspective).

I do, however, wonder if there is a way all these TTS tools can get to the next level. The generated text should not be just a verbatim copy of what I just said, but depending on the context, it should elaborate. For example, if my cursor is actively inside an editor/IDE with some code, my coding-related verbal prompts should actually generate the right/desired code in that IDE.

Perhaps this is a bit of combining TTS with computer-use.

mritchie712 1/15/2026||

I made something called `ultraplan`. It's is a CLI tool that records multi-modal context (audio transcription via local Whisper, screenshots, clipboard content, etc.) into a timeline that AI agents like Claude Code can consume.

I have a claude skill `/record` that runs the CLI which starts a new recording. I debug, research, etc., then say "finito" (or choose your own stopword). It outputs a markdown file with your transcribed speech interleaved with screenshots and text that you copied. You can say other keywords like "marco" and it will take a screenshot hands-free.

When the session ends, claude reads the timeline (e.g. looks at screenshots) and gets to work.

I can clean it up and push to github if anyone would get use out of it.

mritchie712 1/15/2026|||

https://github.com/definite-app/ultraplan

heliostatic 1/15/2026||||

Definitely interested in that!

mritchie712 1/15/2026||

Added link above!

wanderingmind 1/15/2026|||

Sounds interesting I would love to use it if you get a chance to push to github

mritchie712 1/15/2026||

https://github.com/definite-app/ultraplan

sipjca 1/15/2026|||

I totally agree with you and largely what you’re describing is one of the reasons I made Handy open source. I really want to see something like this and see someone go experiment with making it happen. I did hear some people playing with using some small local models (moondream, qwen) to get some more context of the computer itself

I initially had a ton of keyboard shortcuts in handy for myself when I had a broken finger and was in a cast. It let me play with the simplest form of this contextual thing, as shortcuts could effectively be mapped to certain apps with very clear uses cases

eddyg 1/15/2026|||

There’s lots of existing work on “coding by voice” long before LLMs were a thing. For example (from 2013): http://xahlee.info/emacs/emacs/using_voice_to_code.html and the associated HN discussion (“Using Voice to Code Faster than Keyboard”): https://news.ycombinator.com/item?id=6203805

There’s also more recent-ish research, like https://dl.acm.org/doi/fullHtml/10.1145/3571884.3597130

hasperdi 1/15/2026|||

What you said is possible by feeding the output of speech-to-text tools into an LLM. You can prompt the LLM to make sense of what you're trying to achieve and create sets of actions. With a CLI it’s trivial, you can have your verbal command translated into working shell commands. With a GUI it’s slightly more complicated because the LLM agent needs to know what you see on the screen, etc.

That CLI bit I mentioned earlier is already possible. For instance, on macOS there’s an app called MacWhisper that can send dictation output to an OpenAI‑compatible endpoint.

sipjca 1/15/2026||

Handy can post process with LLMs too! It’s just currently hidden behind a debug menu as an alpha feature (ctrl/cmd+shift+d)

sanex 1/15/2026||

I was just thinking about building something like this, looks like you beat me to the punch, I will have to try it out. I'm curious if you're able to give commands just as well as some wording you want cleaned up. I could see a model being confused between editting the command input into text to be inserted and responding to the command. Sorry if that's unclear, might be better if I just try it.

sipjca 1/16/2026||

I’d just try it and fork handy if it doesn’t work how you want :)

ryanshrott 1/25/2026||

[dead]

ryanshrott 1/25/2026||

I've been going down this rabbit hole too. I ended up building DictaFlow (https://dictaflow.vercel.app/) because I needed something that specifically works in VDI/Citrix environments where clipboard pasting is blocked (I work in finance).

It uses a 'character-typing' method instead of clipboard injection, so it's compatible with pretty much anything remote. Also kept it super lightweight (<50MB RAM) for Windows users who don't want to run a full local server stack.

Cool to see Handy using the newer models—local voice tech is finally getting good.

kuatroka 1/15/2026||

Love it. I had been searching for STT app for weeks. Every single app was either paid as a one off or had a monthly subscription. It felt a bit ridiculous having to pay when it’s all powered by such small models on the back end. So I decided to build my own. But then I found “Handy” and it’s been a really amazing partner for me. Super fast, super simple, doesn’t get in my way and it’s constantly updated. I just love it. Thanks a lot for making it! Thanks a lot

P.S. The post processing that you are talking about, wouldn’t it be awesome.

frankdilo 1/15/2026||

This looks great! What’s missing for me to switch from something like Wispr Flow is the ability to provide a dictionary for commonly mistaken words (name of your company, people, code libraries).

tin7in 1/15/2026||

It has something called "Custom Words" which might be what you are describing. Haven't tested this feature yet properly.

frankdilo 1/16/2026||

So is this already in Handy or you are referring to a feature of the underlying models you are still not actively using?

tin7in 1/16/2026||

This is already in Handy in Advanced > Custom Words.

There is also Post Processing where you can rerun the output through an LLM and refine it, which is the closest to what Wispr Flow is doing.

This can be found in the debug menu in the GUI (Cmd + Shift + D).

jauntywundrkind 1/15/2026|||

I dig that some models have an ability to say how sure they are of words. Manually entering a bunch of special words is ok, but I want to be able to review the output and see what words the model was less sure of, so I can go find out what I might need to add.

sipjca 1/15/2026||

There’s a PR for this which will be pulled in soon enough, I can kick off a build of the PR if you want to download a pre release version

sipjca 1/15/2026||

Okay so it's more directly text replacements

https://github.com/cjpais/Handy/actions/runs/21025848728

There is also LLM post processing which can do this, and the built in dictionary feature

Barbing 1/15/2026||

Quick thoughts re: mentioned transcribers

Superwhisper — Been using it a long time. It's paid with a lifetime subscription available. Tons of features. Language models are built right in without additional charge. Solo dev is epic; may defer upgrades to avoid occasional bugs/regressions (hey, it's complex software).

Trying each for a few minutes:

Hex — Feels the leanest (& cleanest) free options mentioned for Mac in this thread.

Fluid Voice — Offers a unique feature, a real-time view of your speech as you talk! Superwhisper has this, but only with an online model. (You can't see your entire transcript in Fluid, though. The recording window view is limited to about one sentence at a time--of course you do see everything when you complete your dictation.)

Handy — Pink and cute. I like the history window. As far as clipboard handling goes, I might note that the "don't modify clipboard" setting is more of a "restore clipboard" setting. Though it doesn't need as many permissions as Hex because it's willing to move clipboard items around a bit, if I'm not mistaken.

Note Hex seems to be upset about me installing all the others... lots of restarting in between installs all around. Each has something to offer.

---

Big shout out to Nvidia open-sourcing Parakeet--all of these apps are lightning fast.

Also I'm partial to being able to stream transcriptions to the cursor into any field, or at least view live like Fluid (or superwhisper online). I know it's complex b/c models transcribe the whole file for accuracy. (I'm OK with seeing a lower quality transcript realtime and waiting a second for the higher-quality version to paste at the end.)

mncharity 1/15/2026||

A cautionary user experience report. The default hotkey upon download is ctrl+space. Press to begin recording, release to transcribe and insert. Key-up on the space key constitutes hotkey release. If the ctrl key is still down when the insertion lands, the transcribed text is treated as ctrl characters. The test app was emacs. (x64 linux x11, with and without xdotool)

PhilippGille 1/15/2026||

Has anyone compared this with https://github.com/HeroTools/open-whispr already? From the description they seem very similar.

Handy first release was June 2025, OpenWhispr a month later. Handy has ~11k GitHub stars, OpenWhispr has ~730.

gabrielste1n 1/21/2026||

Creator of OpenWhispr here! Honoured to be compared to Handy!

I built OW because I was tired of paying for WisprFlow. I'd say it is more flexible by design: Whisper.cpp (CPU + GPU) for super fast local transcription, Parakeet in progress, local or cloud LLMs for cleanup (Qwen, Mistral, Gemini, Anthropic, OpenAI, Groq etc.), and bring-your-own API keys!

Handy is more streamlined for sure!

Would love any feedback :)

kuatroka 1/15/2026||

I did have tried, but the ease of installing handy as just a macOS app is so much simpler than needing to constantly run in npm commands. I think at the time when I was checking it, which was a couple of months ago they did not have the parakeet model, which is a non-whisper model, so I had decided against it. If I remember correctly, the UI was also not the smoothest.

Handy’s ui is so clean and minimalistic that you always know what to do or where to go. Yes, it lacks in some advanced features, but honestly, I’ve been using it for two months now and I’ve never looked back or searched for any other STT app.

ranguna 1/15/2026||

The OP asked if someone compared both, which usually means actually trying both and not just installing one and skimming through the other's README file. So, in summary, you didn't try both and didn't answer the OP.

aucisson_masque 1/15/2026||

It’s incredibly fast on my MacBook m1 air and more accurate that the native speech to text.

The ui is well thought out, just the right amount of setting for my usage.

Incredible !

Btw, do you know what « discharging the model » does ? It’s set to never by default, tried to check if it has an impact on ram or cpu but it doesn’t seem to do anything.

mixtureoftakes 1/15/2026|

the model is permanently loaded into ram for access speed. discharging it would unload it from ram and lead to longer start times

sipjca 1/15/2026||

It does unload it, and actually might be a good default for most people as the model loading does happen in the background as soon as you hit the key

peterldowns 1/15/2026|

Huge fan! Parakeet v3 works great with it. I have used Monologue, Superwhisper, and Aqua, at various times in the past. But Handy is at least as good, and it's not an expensive subscription. I love that it runs locally, too. Strongly recommend!

More comments...