Implementing a Z80 / ZX Spectrum emulator with Claude Code

Posted by antirez 2 days ago

Implementing a Z80 / ZX Spectrum emulator with Claude Code(antirez.com)

110 points | 55 comments

jaen 11 hours ago|

There isn't any attempt to falsify the "clean room" claim in the article - a rational approach would be to not provide any documents about the Z80 and the Spectrum, and just ask it to one-shot an emulator and compare the outputs...

If the one-shot output resembles anything working (and I am betting it will), then obviously this isn't clean room at all.

the_af 1 hour ago||

Even without internet access, probably everything there is to say about Z80/Speccy emulators was already in its training set.

measurablefunc 1 hour ago|||

Author just trusts the agent to not use the internet b/c he wrote it so in the instructions should tell you all you need to know. It's great he managed to prompt it w/ the right specification for writing yet another emulator but I don't think he understands how LLMs actually work so most of the commentary on what's going on with the "psychology" of the LLM should be ignored.

antirez 11 hours ago||

You didn't read the full article. The past paragraph talks about this specifically.

tredre3 1 hour ago||

In the last paragraph you handwave that all the Z80 and ZX Spectrum documentations is likely already in the model anyway... Choosing to not provide the documents/websites might then requiring more prompting to finish the emulator, but the knowledge is there. You can't clean room with a large LLM. That's delusion!

stevekemp 10 hours ago||

I grew up with the Spectrum, and wrote a CP/M emulator a while back. I'd be curious to see how complete it would get.

I struggled a lot with some complex software, which worked on some emulators and failed on others (and mine).

For example one bug I had, which is still outstanding, relates to the Hisoft C compiler:

https://github.com/skx/cpmulator/issues/250

But I see that my cpm-dist repository is referenced in the download script so that made me happy!

It's great to see people still using CP/M, writing software for it, and sharing the knowledge. Though I do think the choice to implement the CCP in C, rather than using a genuine one, is an interesting one, and a bit of a cheat. It means that you cannot use "SUBMIT" and other common-place binaries/utilities.

antirez 10 hours ago|

Thank you for your work about CP/M, Steve!

avadodin 13 hours ago||

So what you're saying is that it's not just the machine-readable documentation built over decades of the officially undocumented behavior of Z80 opcodes—often provided under restrictive licenses—it's also the "known techniques and patterns" of emulator code—often provided under restrictive licenses.

hoc 1 hour ago||

Great project and write-up. I wonder whether most of those "hints" are really needed, though, as you are already using Claude CODE. Aren't things like "simple" and "clean" assumed to be part of its system prompt already (idnividual documentation style etc can't be, of course). While they were useful when using a general LLM for coding, I would think that they are now part of the overall setup of any coding agent. These days I run more into problems with language and api version drifts, even when specified beforehand.

itomato 11 hours ago||

All the design hints required for this or any other type of agentic "set it and forget it" development are interesting to me, because they enable the result but also lock in less-than-desirable results that exhibit a miss "like simulating a 2Mhz clock".

What if Agents were hip enough to recognize that they have navigated into a specialized area and need additional hinting? "I'm set up for CP/M development, but what I really need now is Z80 memory management technique. Let me swap my tool head for the low-level Z80 unit..."

We can throw RAGs on the pile and hope the context window includes the relevant tokens, but what if there were pointers instead?

cbolton 8 hours ago||

I asked Gemini to reproduce the poem "The Road Not Taken". I got it in full (as far as I can tell without Gemini fetching anything from the web). I didn't provide any verse of the poem so I guess that counts as a clean room "implementation"?

rjh29 13 hours ago||

No Carmack or Stallman. Just the right person at the right time.

ontouchstart 1 hour ago||

Is it possible to build a full OS emulator on top of MMIX?

> The above tools could theoretically be used to compile, build, and bootstrap an entire FreeBSD, Linux, or other similar operating system kernel onto MMIX hardware, were such hardware to exist.

https://en.wikipedia.org/wiki/MMIX

ralferoo 13 hours ago||

The problem is that it will have been trained on multiple open source spectrum emulators. Even "don't access the internet" isn't going to help much if it can parrot someone else's emulator verbatim just from training.

Maybe a more sensible challenge would be to describe a system that hasn't previously been emulated before (or had an emulator source released publicly as far as you can tell from the internet) and then try it.

For fun, try using obscure CPUs giving it the same level of specification as you needed for this, or even try an imagined Z80-like but swapping the order of the bits in the encodings and different orderings for the ALU instructions and see how it manages it.

throwa356262 10 hours ago||

I think you are into something here.

I tried creating an emulator for CPU that is very well known but lacks working open source emulators.

Claude, Codex and Gemini were very good at starting something that looked great but all failed to reach a working product. They all ended up in a loop where fixing one issues caused something else to break and could never get out of it.

stuaxo 6 hours ago||||

When they get stuck, I find adding debug that the model can access helps. + Sometimes you need to add something into the prompt to tell it to avoid some approach at a point.

dboreham 1 hour ago||||

Interesting. When I had Claude write a language transpiler it always checked that tests passed before declaring a feature ready for PR. There was never a case where it gave up on achieving that goal.

antirez 10 hours ago|||

Please tell me what CPU it is. I would give it a try. I doubt strongly a very well documented CPU can't be emulated by writing the code with modern AIs.

PontifexMinimus 12 hours ago|||

> try using obscure CPUs

Better still invent a CPU instruction set, and get it to write an emulator for that instruction set in C.

Then invent a C-like HLL and get it to write a compiler from your HLL to your instruction set.

abainbridge 11 hours ago|||

> try using obscure CPUs

I tried asking Gemini and ChatGPT, "What opcode has the value 0x3c on the Intel 8048?"

They were both wrong. The datasheet with the correct encodings is easily found online. And there are several correct open source emulators, eg MAME.

bsoles 4 hours ago|||

Even on a specific STM microcontroller (STM32G031), the LLM tools invent non-existent registers and then apologize when I point it out. And conversely, they write code for an entire algorithm (CRC, for example) when hardware support already exists on the chip.

stuaxo 6 hours ago||||

Think of "What opcode has the value 0x3c on the Intel 8048" as a PNG image but the LLM like a very compressed JPEG. It will only get a very approximate answer. But you can give it explicit tools to look up things.

yomismoaqui 10 hours ago|||

If the LLM doesn't have a websearch tool your test doesn't make any sense.

An LLM by itself is like a lossy image of all text in the internet.

deniska 10 hours ago||

Just some more parameters, and it would overfit that specific PDF too.

kamranjon 9 hours ago|||

I thought this part of the write-up was interesting:

"This is, I think, in contradiction with the idea that LLMs are memorizing the whole training set and uncompress what they have seen. LLMs can memorize certain over-represented documents and code, but while they can extract such verbatim parts of the code if prompted to do so, they don’t have a copy of everything they saw during the training set, nor they spontaneously emit copies of already seen code, in their normal operation."

Can't things basically get baked into the weights when trained on enough iterations, and isn't this the basis for a lot of plagiarism issues we saw with regards to code and literature? It seems like this is maybe downplaying the unattributed use of open source code when training these models.

dist-epoch 13 hours ago||

If you did that, comments would be "it's just a bit shuffle of the encodings, of course it can manage that, but how about we do totally random encodings..."

ralferoo 13 hours ago||

That's true, but I still think it'd be an interesting experiment to see how much it actually follows the specification vs how much it hallucinates by plagiarising from existing code.

Probably bonus points for telling it that you're emulating the well known ZX Spectrum and then describe something entire different and see whether it just treats that name as an arbitrary label, or whether it significantly influences its code generation.

But you're right of course, instruction decoding is a relatively small portion of a CPU that the differences would be quite limited if all the other details remained the same. That's why a completely hypothetical system is better.

le-mark 10 hours ago|

Who else had ai implement an emulator? Raises hand. A 6502 emulator in JavaScript was my first Gemini experiment.

More comments...