Top
Best
New

Posted by davidye324 6/30/2025

Show HN: Local LLM Notepad – run a GPT-style model from a USB stick(github.com)
What it is A single 45 MB Windows .exe that embeds llama.cpp and a minimal Tk UI. Copy it (plus any .gguf model) to a flash drive, double-click on any Windows PC, and you’re chatting with an LLM—no admin rights, Cloud, or network.

Why I built it Existing “local LLM” GUIs assume you can pip install, pass long CLI flags, or download GBs of extras.

I wanted something my less-technical colleagues could run during a client visit by literally plugging in a USB drive.

How it works PyInstaller one-file build → bundles Python runtime, llama_cpp_python, and the UI into a single PE.

On first launch, it memory-maps the .gguf; subsequent prompts stream at ~20 tok/s on an i7-10750H with gemma-3-1b-it-Q4_K_M.gguf (0.8 GB).

Tick-driven render loop keeps the UI responsive while llama.cpp crunches.

A parser bold-underlines every token that originated in the prompt; Ctrl+click pops a “source viewer” to trace facts. (Helps spot hallucinations fast.)

40 points | 9 comments
gxonatano 7/1/2025|
> walk up to any computer

Windows users seem to think their OS is ubiquitous. But in fact for most hackers reading this site, using Windows is a huge step backwards in productivity and capability.

jaggs 7/1/2025||
However the facts speak otherwise? Windows at 70%+ versus 4.1% for Linux globally. https://gs.statcounter.com/os-market-share/desktop/worldwide
thebitstick 7/1/2025||
> But in fact for most hackers reading this site

https://survey.stackoverflow.co/2024/technology#1-operating-...

hereme888 7/7/2025|||
idk... I gave up years of trying to switch to Linux as my main OS after the obvious difference in stability, support, ecosystem, and...yes even responsiveness in many apps.
Zetaphor 7/1/2025||
Surely you're hinting at Linux, in which case this runs fine with WINE
exe34 7/1/2025||
Why not llamafile? Runs on everything from toothbrushes to toasters...
romperstomper 7/2/2025|
Seconded for Llamafile, here is a link for references https://github.com/Mozilla-Ocho/llamafile . It indeed is working on all major platforms and its tooling allows easy creating of new llamafiles with new models. The only caveat is Windows where there is a limit 4Gb for executable files so just a llamafile launcher and the gguf file itself must be used. But this approach will work anywhere anyway.
ensocode 7/1/2025||
Interesting, will definitely try it. What can be expected? What other models do perform ok with this?
ge96 7/1/2025|
Wonder if you can use/interface with those coral accelerator boards