Remix.run Logo
adrian_b 9 hours ago

This CPU simulator does not attempt to achieve the maximum speed that could be obtained when simulating a CPU on a GPU.

For that a completely different approach would be needed, e.g. by implementing something akin to qemu, where each CPU instruction would be translated into a graphic shader program. On many older GPUs, it is impossible or difficult to launch a graphic program from inside a graphic program (instead of from the CPU), but where this is possible one could obtain a CPU emulation that would be many orders of magnitude faster than what is demonstrated here.

Instead of going for speed, the project demonstrates a simpler self-contained implementation based on the same kind of neural networks used for ML/AI, which might work even on an NPU, not only on a GPU.

Because it uses inappropriate hardware execution units, the speed is modest and the speed ratios between different kinds of instructions are weird, but nonetheless this is an impressive achievement, i.e. simulating the complete Aarch64 ISA with such means.

5o1ecist 8 hours ago | parent [-]

> where each CPU instruction would be translated into a graphic shader program

You really think having a shader per CPU-instruction is going to get you closer to the highest possible speed one can achieve?

adrian_b 4 hours ago | parent | next [-]

You could coalesce multiple instructions per shader, but even with a single CPU instruction (which would be translated to a sequence of GPU instructions), you could reach orders of magnitude greater speed than in this neural network implementation, by using the arithmetic-logic execution units of the GPU.

Once translated, the shader programs would be reused. All this could be inserted in qemu, where a CPU is emulated by generating for each instruction a short program that is compiled and then the resulting executable functions are cached and executed during the interpretation of the program for the emulated CPU.

In qemu, one could replace the native CPU compiler with a GPU compiler, either for CUDA or for a graphic shader language, depending on the target GPU. Then the compiled shaders could be loaded in the GPU memory, where, if the GPU is recent enough to support this feature, they could launch each other in execution.

Eventually, one might be able to use a modified qemu running on the CPU to bootstrap a qemu + a shader compiler that have been translated to run on the GPU, so that the entire simulation of a CPU is done on the GPU.

koolala 6 hours ago | parent | prev [-]

If its bindless and pre-compiled why not? What's a faster way?