Posted by quesomaster9000 12/29/2025
Z80-μLM is a character-level language model with 2-bit quantized weights ({-2,-1,0,+1}) that runs on a Z80 with 64KB RAM. The entire thing: inference, weights, chat UI, it all fits in a 40KB .COM file that you can run in a CP/M emulator and hopefully even real hardware!
It won't write your emails, but it can be trained to play a stripped down version of 20 Questions, and is sometimes able to maintain the illusion of having simple but terse conversations with a distinct personality.
--
The extreme constraints nerd-sniped me and forced interesting trade-offs: trigram hashing (typo-tolerant, loses word order), 16-bit integer math, and some careful massaging of the training data meant I could keep the examples 'interesting'.
The key was quantization-aware training that accurately models the inference code limitations. The training loop runs both float and integer-quantized forward passes in parallel, scoring the model on how well its knowledge survives quantization. The weights are progressively pushed toward the 2-bit grid using straight-through estimators, with overflow penalties matching the Z80's 16-bit accumulator limits. By the end of training, the model has already adapted to its constraints, so no post-hoc quantization collapse.
Eventually I ended up spending a few dollars on Claude API to generate 20 questions data (see examples/guess/GUESS.COM), I hope Anthropic won't send me a C&D for distilling their model against the ToS ;P
But anyway, happy code-golf season everybody :)
https://i.imgur.com/6TRe1NE.png
Thank you for posting! It's unbelievable how someone sometimes just drops something that fits right into what you're doing. However bizarre it seems.
I developed a browser-based CP/M emulator & IDE: https://lockboot.github.io/desktop/
I was going to post that instead, but wanted a 'cool demo' instead, and fell down the rabbit hole.
I wrote a console-based emulator, and a simple CP/M text-adventure game somewhat recently
https://github.com/skx/cpmulator/
At some point I should rework my examples/samples to become a decent test-suite for CP/M emulators. There are so many subtle differences out there.
It seems I could even upload a zipfile of my game, but the escape-codes for clearing the screen don't work, sadly:
Although from what I remember from the TV show, most of what he investigates/talks about is indeed path dependence in one way or another, although not everything was like that.
imgur was created as a sort of protest against how terrible most image hosting platforms were back then, went down the drain several years later, and it's now just like they were.
The interaction is surprisingly good despite the lack of attention mechanism and the limitation of the "context" to trigrams from the last sentence.
This could have worked on 60s-era hardware and would have completely changed the world (and science fiction) back then. Great job.
Tin foil hat on: i think that a huge part of the major buyout of ram from AI companies is to keep people from realising that we are essentially at the home computer revolution stage of llms. I have a 1tb ram machine which with custom agents outperforms all the proprietary models. It's private, secure and won't let me be motetized.
Ultimately, if you can build an ultra tiny model that can talk and learn on the fly, you've just fully localized a personal assistant like Siri.
Not exactly "minimal viable", but a "what if RNNs where good for LLMs" case study.
-> insanely fast on CPUs
Edit: The fact this runs on a Smartphone means it is highly relevant. My only thing is, how do we give such a model an "unlimited" context window, so it can digest as much as it needs. I know some models know multiple languages, I wouldnt be surprised if sticking to only English would reduce the model size / need for more hardware and make it even smaller / tighter.
I doubt it would be able to make good use of a large context window, though.
You can buy a kid’s tiger electronics style toy that plays 20 questions.
It’s not like this LLM is bastion of glorious efficiency, it’s just stripped down to fit on the hardware.
Slack/Teams handles company-wide video calls and can render anything a web browser can, and they run an entire App Store of apps, all from a cross-platform application.
Including Jira in the conversation doesn’t even make logical sense. It’s not a desktop application that consumes memory. Jira has such a wide scope that the word “Jira” doesn’t even describe a single product.
By itself, I would agree.
However, in this metaphor, concrete got 15x cheaper in the same timeframe. Not enough to fully compensate for the difference, but enough that a whole generation are now used to much larger edifices.
(At this point the analogy breaks down because who pays for the software being slower is the users' time, not the taxes paid by a government buying a bridge from a civil engineer…)
* I don't actually buy the argument that the last decade or so of layers of "abstraction" save us developers any time at all, rather I think they're now several layers deep of nested inner platforms that each make things more complicated, but that's a separate entire thread, and blog post: https://benwheatley.github.io/blog/2024/04/07-21.31.19.html
The word processors of 30 years ago often had limits like “50k chapters” and required “master documents” for anything larger. Lotus 123 had much fewer columns or rows than modern excel.
Not an excuse, of course, but the older tools are not usable anymore if you have modern expectations.
I would be interested to know the name of the program that did all that within the same app during that time period.
For some reason Slack gets criticism for being “bloated” when it basically does anything you could possibly imagine and is essentially a business communication application platform. Nobody can actually name a specific application that does everything Slack does with better efficiency.
And you bring up things that are supposedly bad about Slack that are basically non-existent boogeymen. UI stutter, load time, and excessive memory use, I can’t think of any time any of these things have existed at all or noticeably impacted my experience on Slack on a basic low end laptop.
Those older apps like MSN Messenger and the original Skype didn’t actually do the things that Slack does now. I mean specifically multiple simultaneous screen shares plus annotations plus HD video feeds (with important features like blurred and replaced backgrounds, added by Skype in 2019) for all participants plus running an entire productivity app in the background at the same time.
Skype didn’t have screen sharing, at all, until 2009.
https://content.dsp.co.uk/history-of-skype
You call this situation “unjustifiable” but we would struggle to find any personal computing device sold at any price point that can’t handle the application smoothly. If I go back five years and buy a $200 mini PC or a $300 iPad or $500 laptop it’s going to run Slack just fine.
Specs are just arbitrary numbers on a box. It doesn’t matter that we got to the moon using a turd and a ham sandwich for a computer.
You can’t accept that the layperson doesn’t care that back in my day we walked uphill both ways for 15 miles on our dial-up connection. If it works, it works.
The 4th Gen iPod touch had 256 meg of RAM and also did those things, with video calling via FaceTime (and probably others, but I don't care). Well, except "cross platform", what with it being the platform.
Remember that Slack does simultaneous multiple participants screen sharing plus annotations plus HD video feeds from all participants plus the entirety of the rest of the app continues to function as if you weren’t on a call at all simultaneously.
It’s an extremely powerful application when you really step back and think about it. It just looks like “text” and boring business software.
And CU-SeeMe did that in the early 90s with even worse hardware: https://en.wikipedia.org/wiki/File:CU-Schools.GIF
Even more broadly, group calls were sufficiently widely implemented to get themselves standardised 29 years ago: https://en.wikipedia.org/wiki/H.323
> It’s an extremely powerful application when you really step back and think about it. It just looks like “text” and boring business software.
The *entire operating system of the phone* is more powerful, and ran on less.
Showing me a black and white <10FPS group video call with no other accompanying software running simultaneously in the 90s is pointless.
Showing me that someone thought of a protocol is pointless. Just look at the history of HDTV. We wouldn’t really describe HDTV as being available to consumers despite it existing in the early 1990s.
I’d also like you to show me a laptop SKU sold in the last 10 years that is incapable of running Slack. If Slack is so inefficient you should be able to find me a computer that struggles with it.
Finally, I’ll remind you that Slack for mobile is a different application that isn’t running in the same way as the desktop app and uses fewer resources. The latest version of it will run on very old phone hardware, going all the way back to the iPhone 8 (2GB RAM), and that’s assuming you even need the latest version for it to function.
1 Ghz processor, 512 MB RAM (might even manage 256 MB), 1080p monitor. And "a graphics accelerator", "a sound card", and "a webcam and microphone".
Probably even less on the RAM and CPU.
> and link me to an example program that has 100% feature parity that stays within those specs?
Windows 2000. Or XP.
That's the point. The OS supports all the apps needed to do whatever.
Making Slack into a monolithic blob to do all is just an example of the inner platform effect.
But if you insist: IE 7 would have been able to do all this. It's an app. It's also an example of the inner platform effect.
> Showing me a black and white <10FPS group video call with no other accompanying software running simultaneously in the 90s is pointless.
You should've thought of that before trying to "well akshually" me about which versions of FaceTime support multi-user video calling.
You want video calling? We had that 30 years ago on systems with total RAM smaller than current CPU cache, with internal busses whose bandwidth was less than your mobile's 5G signal, on screens smaller than the icon that has to be submitted to the App Store, with cameras roughly comparable to what we now use for optical mice, running over networks that were MacGyvered onto physical circuits intended for a single analogue voice signal.
Out of everything you list that Slack can do, the only thing that should even be remotely taxing is the HD video calling. Nothing else, at all. And the only reasons for even that to be taxing is correctly offloading work to the GPU and that you want HD. The GPU should handle this kind of thing trivially so long as you know how to use it.
All the "business logic" you mention in the other thread… if you can't handle the non-video business logic needed to be a server hosting 2000 simultaneous users on something with specs similar to a Raspberry Pi, you're not trying hard enough. I've done that. Business logic is the easy part for anything you can describe as "chat". Even if you add some minigames in there and the server is keeping track of the games, it should be a rounding error on a modern system.
Meanwhile I can play back multiple 1080 videos on different monitors, run a high speed curl download, saturate my gigabit LAN with a bulk transfer, and run a brrfs scrub in the background all most likely without breaking 2 GB of RAM usage. MPV, VLC, and ffmpeg are all remarkably lightweight.
The only daily application I run that consumes a noticable quantity of resources is my web browser.
This argument is just so endless and tiring.
Saturating my bandwidth or running a btrfs scrub isn’t accomplishing the business logic I need to do my job, that’s what my web browser is doing.
People making excuses for poorly designed software is what's tiring.
Modern chat apps like Slack, Discord, Teams, etc. are extremely resource intense solely by being skinned Chrome showing overbloated HTMLs. That's it. Most of the "actual" engineering of it is outsourced and externalized to Google, NVIDIA/Intel/AMD, Microsoft/Apple, etc.
That's a bug not a feature, and strongly coupled to the root cause for slack's bloat.
The app ecosystem of Slack is largely responsible for its success. You can extend it to do almost anything you want.
Is that true? Slack was one of the first private chats that was not painful to use, circa 2015. I personally hate the integrations and wish they'd just fix the bugs in their core product.
“Planting Undetectable Backdoors in Machine Learning Models”
“ … On the surface, such a backdoored classifier behaves normally, but in reality, the learner maintains a mechanism for changing the classification of any input, with only a slight perturbation. Importantly, without the appropriate "backdoor key", the mechanism is hidden and cannot be detected by any computationally-bounded observer. We demonstrate two frameworks for planting undetectable backdoors, with incomparable guarantees. …”
It could with a network this small. More generally this falls under "interpretability."
(edit: change url)
This means that a directly translated 40 KB Z80 executable might be a tight squeeze on that mainframe, because 40K > 32K, counting words, not bytes. Of course if most of that size is just 2-bit weight data then it might not be so bad.
ELIZA running on later hardware would have been a different story, with the Z80 - released in 1976 - being an example.