Top
Best
New

Posted by quesomaster9000 12/29/2025

Show HN: Z80-μLM, a 'Conversational AI' That Fits in 40KB(github.com)
How small can a language model be while still doing something useful? I wanted to find out, and had some spare time over the holidays.

Z80-μLM is a character-level language model with 2-bit quantized weights ({-2,-1,0,+1}) that runs on a Z80 with 64KB RAM. The entire thing: inference, weights, chat UI, it all fits in a 40KB .COM file that you can run in a CP/M emulator and hopefully even real hardware!

It won't write your emails, but it can be trained to play a stripped down version of 20 Questions, and is sometimes able to maintain the illusion of having simple but terse conversations with a distinct personality.

--

The extreme constraints nerd-sniped me and forced interesting trade-offs: trigram hashing (typo-tolerant, loses word order), 16-bit integer math, and some careful massaging of the training data meant I could keep the examples 'interesting'.

The key was quantization-aware training that accurately models the inference code limitations. The training loop runs both float and integer-quantized forward passes in parallel, scoring the model on how well its knowledge survives quantization. The weights are progressively pushed toward the 2-bit grid using straight-through estimators, with overflow penalties matching the Z80's 16-bit accumulator limits. By the end of training, the model has already adapted to its constraints, so no post-hoc quantization collapse.

Eventually I ended up spending a few dollars on Claude API to generate 20 questions data (see examples/guess/GUESS.COM), I hope Anthropic won't send me a C&D for distilling their model against the ToS ;P

But anyway, happy code-golf season everybody :)

514 points | 122 commentspage 4
Y_Y 12/29/2025|
Very cool. Did you consider using sparse weights?
integricho 12/29/2025||
Someone add it to collapseos please :)
bytesandbits 12/29/2025||
it's giving Eliza! Ha, fun
NooneAtAll3 12/29/2025||
did you measure token/s?
codetiger 12/29/2025||
Imagine, this working on a Gameboy, in those days. Would've sounded like magic
Sharlin 12/29/2025||
I don’t think this could beat an ELIZA-style bot in how magical it feels, given the extreme terseness of its replies.
lodovic 12/29/2025|||
I love these thought experiments. Looking at the code size, it would have been possible for someone to come up with this back in the days, similar to the idea of a million monkeys on a typewriter eventually producing Shakespeare.
alfiedotwtf 12/29/2025|||
And would have lasted 3 minutes.

Speaking of - I remember my first digital camera (Fujitsu 1Mb resolution using SmartMedia)… it used so much power that you could take 20-30 photos and then needed to replace all 4 batteries lol

numpad0 12/29/2025|||
Flip phones had predictive texts since forever. LLMs are just* supercharged predi[ctive text algorithms are computer algorithms that are]
qingcharles 12/29/2025||
"Look, my Game Boy passes the Turing Test!"

*burns you at the stake*

devhouse 12/29/2025|
[dead]