Show HN: Z80-μLM, a 'Conversational AI' That Fits in 40KB

Posted by quesomaster9000 12/29/2025

Show HN: Z80-μLM, a 'Conversational AI' That Fits in 40KB(github.com)

How small can a language model be while still doing something useful? I wanted to find out, and had some spare time over the holidays.

Z80-μLM is a character-level language model with 2-bit quantized weights ({-2,-1,0,+1}) that runs on a Z80 with 64KB RAM. The entire thing: inference, weights, chat UI, it all fits in a 40KB .COM file that you can run in a CP/M emulator and hopefully even real hardware!

It won't write your emails, but it can be trained to play a stripped down version of 20 Questions, and is sometimes able to maintain the illusion of having simple but terse conversations with a distinct personality.

The extreme constraints nerd-sniped me and forced interesting trade-offs: trigram hashing (typo-tolerant, loses word order), 16-bit integer math, and some careful massaging of the training data meant I could keep the examples 'interesting'.

The key was quantization-aware training that accurately models the inference code limitations. The training loop runs both float and integer-quantized forward passes in parallel, scoring the model on how well its knowledge survives quantization. The weights are progressively pushed toward the 2-bit grid using straight-through estimators, with overflow penalties matching the Z80's 16-bit accumulator limits. By the end of training, the model has already adapted to its constraints, so no post-hoc quantization collapse.

Eventually I ended up spending a few dollars on Claude API to generate 20 questions data (see examples/guess/GUESS.COM), I hope Anthropic won't send me a C&D for distilling their model against the ToS ;P

But anyway, happy code-golf season everybody :)

514 points | 122 commentspage 2

orbital-decay 12/29/2025|

Pretty cool! I wish free-input RPGs of old had fuzzy matchers. They worked by exact keyword matching and it was awkward. I think the last game of that kind (where you could input arbitrary text when talking to NPCs) was probably Wizardry 8 (2001).

Peteragain 12/29/2025||

There are two things happening here. A really small LLM mechanism which is useful for thinking about how the big ones work, and a reference to the well known phenomenon, commonly dismissively referred to as a "trick", in which humans want to believe. We work hard to account for what our conversational partner says. Language in use is a collective cultural construct. By this view the real question is how and why we humans understand an utterance in a particular way. Eliza, Parry, and the Chomsky bot at http://chomskybot.com work on this principle. Just sayin'.

nrhrjrjrjtntbt 12/29/2025|

MAYBE

cwmoore 12/29/2025||

Universally correct reply, although honestly a bit vague.

Peteragain 12/30/2025|||

Fair. The background reading is the EMCA stuff - conversation analysis cf Sacks etc at, and Ethnomethods (Garfunkel). And Vygotsky cf Kozulin. People such as Robert Moore at IBM and Lemon at Herriot-Watt work in this space but there is no critical mass in the face of LLM mania.

Peteragain 12/30/2025|||

And the Chomskybot analysis is quite enlightening..

Zee2 12/29/2025||

This is super cool. Would love to see a Z80 simulator set up with these examples to play with!

Imustaskforhelp 12/29/2025||

100% Please do this! I wish the same

dmd 12/31/2025||

https://3e.org/private/z80ulmweb/

It's just one-shot AI slop - literally, the prompt was 'make a web based version of [github url of this project]' and it spat this out. It appears to work fine.

I'll keep it up for a couple of months and then it'll be auto-deleted, no sense in keeping it around longer than that.

bartread 12/29/2025||

This is excellent. Thing I’d like to do if I had time: get it running on a 48K Spectrum. 10 year old me would have found that absolutely magical back in the 1980s.

tomduncalf 12/29/2025|

This was my first thought too haha. That would be mind blowing

bartread 12/29/2025||

Yeah, very WarGames.

EDIT: Actually thinking about it some more…

- Imagine what you could do with 16-bit games of the era with one or more of these models embedded. Swap the model depending on the use case within the game. Great for adventures, RPGs, strategy, puzzle, and trading games (think Elite). With 512K or 1MB of RAM, plus 2 - 4 floppies (which became increasingly common as the era wore on), you could probably do a lot, especially if the outcomes of conversations can result in different game outcomes

- Back in the day nobody was really trying to do anything serious with AI on 8 or even most 16-bit machines, because nobody thought they were powerful enough to do anything useful with. Now the thinking has changed to how much somewhat useful intelligence can I cram into the least powerful device, even if that’s only for fun?

- Imagine showing this running on a CP/M machine, like the C128, to a serious AI researcher working back in the 1980s. Minds blown, right?

- Now spool forward 10 years into the 1990s and think what PC hardware of that era would have been capable of with these limited language models. I wonder what that era might have looked like with something that seems like somewhat useful conversational AI? A sort of electro-steampunk-ish vibe maybe? People having really odd conversations with semi-capable home automation running via their PCs.

jrdres 1/4/2026||

It runs, but it would be very slow on actual hardware.

I tried on a cycle-accurate emulator of a TRS-80 Model I with Omikron CP/M mapper. Most Z-80 machines of the time were 4MHz, but the TRS-80 was only 1.77 MHz.

1. Type "GUESS", get question prompt.

2. User types: "Are you an animal?", ENTER key

3. Wait 25 seconds

4. Program prints "N"

5. Wait 20 seconds

6. Program prints "O"

7. Wait 23 seconds

8. Program prints linefeed, returns to question prompt

Total time to return 2-char answer to user's question: 1 min 9 sec or so. I bet a longer answer would take proportionally longer.

"The wonder isn't that it does it well, it's a wonder it does it at all."

gp2000 1/5/2026|

Though it'll still be kinda slow on a Model I, I've written an about 9 times faster Z-80 code for the network evaluation. I imagine the pull request will end up in the main depot but for now you can find it in https://github.com/gp48k/z80ai

I think I can do a little bit better; maybe 10% faster.

gp2000 1/7/2026||

Well, I was pessimistic. Just pushed an update that slightly more than doubles the execution speed with a PR to the main depot pending. It is very close to 20 times faster than the original.

MagicMoonlight 12/29/2025||

What I really want is a game where each of the NPCs has a tiny model like this, so you can actually talk to them.

GuB-42 12/30/2025|

I thought about this, chatbots existed well before LLMs (Eliza: 1966!) and the only time I have seen a commercially successful game with a (very simple) chatbot was Quake III Arena!

Quake 3 is probably the last game where you would expect a chatbot, as there are few games where storytelling matters less and it is a very little known feature, but Quake 3 bots can react to what you say in the chat, in addition to the usual taunts.

But that's the thing, Quake 3 can do it because it is inconsequential, in a story-driven game like a RPG, NPCs have a well defined spot in the story and gameplay, they tell you exactly what you need to know, as to not disrupt the flow of the story. Tell you too much, and they will spoil the big reveal, tell you too little, and you don't know what to do, tell you irrelevant details and you get lost chasing them. It has to be concise and to the point, so that those who don't really care know what to do to advance the story, but with enough flavor to make the world alive. It is really hard to find the right balance, and if in addition, you have to incorporate a chatbot, it borders on impossible.

It looks like a good idea on the surface, but it most likely isn't, unless it is clearly not part of the main gameplay loop, as in Quake 3.

Some people had some success using a (big) LLM as a DM in D&D, which I think is easier since it can make up the story as it advances, it is much harder to make up game elements in a computer RPG that are not programmed in.

vatary 12/29/2025||

It's pretty obvious this is just a stress test for compressing and running LLMs. It doesn't have much practical use right now, but it shows us that IoT devices are gonna have built-in LLMs really soon. It's a huge leap in intelligence—kind of like the jump from apes to humans. That is seriously cool.

acosmism 12/29/2025|

i'll echo that practicality only surfaces once it is apparent what can be done. yea this feels like running "DOOM on pregnancy test devices" type of moment

anonzzzies 12/29/2025||

Luckily I have a very large amount of MSX computers, zx, amstrad cpc etc and even one multiprocessor z80 cp/m machine for the real power. Wonder how gnarly this is going to perform with bankswitching though. Probably not good.

alfiedotwtf 12/29/2025||

An LLM in a .com file? Haha made my day

teaearlgraycold 12/29/2025|

SLM

quesomaster9000 12/29/2025||

All the 'Small' language models and the 'TinyML' scene in general tend to bottom out at a million parameters, hence I though 'micro' is more apt at ~150k params.

jacquesm 12/29/2025|

Between this and RAM prices Zilog stock must be up! Awesome hack. Now apply the same principles to a laptop and take a megabyte or so, see what that does.

More comments...