Show HN: Z80-μLM, a 'Conversational AI' That Fits in 40KB

Posted by quesomaster9000 12/29/2025

Show HN: Z80-μLM, a 'Conversational AI' That Fits in 40KB(github.com)

How small can a language model be while still doing something useful? I wanted to find out, and had some spare time over the holidays.

Z80-μLM is a character-level language model with 2-bit quantized weights ({-2,-1,0,+1}) that runs on a Z80 with 64KB RAM. The entire thing: inference, weights, chat UI, it all fits in a 40KB .COM file that you can run in a CP/M emulator and hopefully even real hardware!

It won't write your emails, but it can be trained to play a stripped down version of 20 Questions, and is sometimes able to maintain the illusion of having simple but terse conversations with a distinct personality.

The extreme constraints nerd-sniped me and forced interesting trade-offs: trigram hashing (typo-tolerant, loses word order), 16-bit integer math, and some careful massaging of the training data meant I could keep the examples 'interesting'.

The key was quantization-aware training that accurately models the inference code limitations. The training loop runs both float and integer-quantized forward passes in parallel, scoring the model on how well its knowledge survives quantization. The weights are progressively pushed toward the 2-bit grid using straight-through estimators, with overflow penalties matching the Z80's 16-bit accumulator limits. By the end of training, the model has already adapted to its constraints, so no post-hoc quantization collapse.

Eventually I ended up spending a few dollars on Claude API to generate 20 questions data (see examples/guess/GUESS.COM), I hope Anthropic won't send me a C&D for distilling their model against the ToS ;P

But anyway, happy code-golf season everybody :)

514 points | 122 commentspage 4

Y_Y 12/29/2025|

Very cool. Did you consider using sparse weights?

integricho 12/29/2025||

Someone add it to collapseos please :)

bytesandbits 12/29/2025||

it's giving Eliza! Ha, fun

NooneAtAll3 12/29/2025||

did you measure token/s?

codetiger 12/29/2025||

Imagine, this working on a Gameboy, in those days. Would've sounded like magic

Sharlin 12/29/2025||

I don’t think this could beat an ELIZA-style bot in how magical it feels, given the extreme terseness of its replies.

lodovic 12/29/2025|||

I love these thought experiments. Looking at the code size, it would have been possible for someone to come up with this back in the days, similar to the idea of a million monkeys on a typewriter eventually producing Shakespeare.

alfiedotwtf 12/29/2025|||

And would have lasted 3 minutes.

Speaking of - I remember my first digital camera (Fujitsu 1Mb resolution using SmartMedia)… it used so much power that you could take 20-30 photos and then needed to replace all 4 batteries lol

numpad0 12/29/2025|||

Flip phones had predictive texts since forever. LLMs are just* supercharged predi[ctive text algorithms are computer algorithms that are]

qingcharles 12/29/2025||

"Look, my Game Boy passes the Turing Test!"

*burns you at the stake*

devhouse 12/29/2025|

[dead]