The problem is that it will have been trained on multiple open source spectrum emulators. Even "don't access the internet" isn't going to help much if it can parrot someone else's emulator verbatim just from training.

Maybe a more sensible challenge would be to describe a system that hasn't previously been emulated before (or had an emulator source released publicly as far as you can tell from the internet) and then try it.

For fun, try using obscure CPUs giving it the same level of specification as you needed for this, or even try an imagined Z80-like but swapping the order of the bits in the encodings and different orderings for the ALU instructions and see how it manages it.

▲

throwa356262 13 hours ago | parent | next [-]

I think you are into something here.

I tried creating an emulator for CPU that is very well known but lacks working open source emulators.

Claude, Codex and Gemini were very good at starting something that looked great but all failed to reach a working product. They all ended up in a loop where fixing one issues caused something else to break and could never get out of it.

	▲	7 hours ago \| parent \| next [-]
		[deleted]
	▲	stuaxo 9 hours ago \| parent \| prev \| next [-]
		When they get stuck, I find adding debug that the model can access helps. + Sometimes you need to add something into the prompt to tell it to avoid some approach at a point.
	▲	antirez 13 hours ago \| parent \| prev \| next [-]
		Please tell me what CPU it is. I would give it a try. I doubt strongly a very well documented CPU can't be emulated by writing the code with modern AIs.
	▲	dboreham 3 hours ago \| parent \| prev [-]
		Interesting. When I had Claude write a language transpiler it always checked that tests passed before declaring a feature ready for PR. There was never a case where it gave up on achieving that goal.

▲

PontifexMinimus 15 hours ago | parent | prev | next [-]

> try using obscure CPUs

Better still invent a CPU instruction set, and get it to write an emulator for that instruction set in C.

Then invent a C-like HLL and get it to write a compiler from your HLL to your instruction set.

▲

abainbridge 14 hours ago | parent | prev | next [-]

> try using obscure CPUs

I tried asking Gemini and ChatGPT, "What opcode has the value 0x3c on the Intel 8048?"

They were both wrong. The datasheet with the correct encodings is easily found online. And there are several correct open source emulators, eg MAME.

▲

bsoles 6 hours ago | parent | next [-]

Even on a specific STM microcontroller (STM32G031), the LLM tools invent non-existent registers and then apologize when I point it out. And conversely, they write code for an entire algorithm (CRC, for example) when hardware support already exists on the chip.

▲

stuaxo 9 hours ago | parent | prev | next [-]

Think of "What opcode has the value 0x3c on the Intel 8048" as a PNG image but the LLM like a very compressed JPEG. It will only get a very approximate answer. But you can give it explicit tools to look up things.

▲

yomismoaqui 13 hours ago | parent | prev [-]

If the LLM doesn't have a websearch tool your test doesn't make any sense.

An LLM by itself is like a lossy image of all text in the internet.

	▲	deniska 13 hours ago \| parent [-]
		Just some more parameters, and it would overfit that specific PDF too.

▲

kamranjon 12 hours ago | parent | prev | next [-]

I thought this part of the write-up was interesting:

"This is, I think, in contradiction with the idea that LLMs are memorizing the whole training set and uncompress what they have seen. LLMs can memorize certain over-represented documents and code, but while they can extract such verbatim parts of the code if prompted to do so, they don’t have a copy of everything they saw during the training set, nor they spontaneously emit copies of already seen code, in their normal operation."

Can't things basically get baked into the weights when trained on enough iterations, and isn't this the basis for a lot of plagiarism issues we saw with regards to code and literature? It seems like this is maybe downplaying the unattributed use of open source code when training these models.

▲

13 hours ago | parent | prev | next [-]

[deleted]

▲

dist-epoch 16 hours ago | parent | prev [-]

If you did that, comments would be "it's just a bit shuffle of the encodings, of course it can manage that, but how about we do totally random encodings..."

	▲	ralferoo 15 hours ago \| parent [-]
		That's true, but I still think it'd be an interesting experiment to see how much it actually follows the specification vs how much it hallucinates by plagiarising from existing code. Probably bonus points for telling it that you're emulating the well known ZX Spectrum and then describe something entire different and see whether it just treats that name as an arbitrary label, or whether it significantly influences its code generation. But you're right of course, instruction decoding is a relatively small portion of a CPU that the differences would be quite limited if all the other details remained the same. That's why a completely hypothetical system is better.