| ▲ | 5o1ecist 10 hours ago | |
> where each CPU instruction would be translated into a graphic shader program You really think having a shader per CPU-instruction is going to get you closer to the highest possible speed one can achieve? | ||
| ▲ | adrian_b 5 hours ago | parent | next [-] | |
You could coalesce multiple instructions per shader, but even with a single CPU instruction (which would be translated to a sequence of GPU instructions), you could reach orders of magnitude greater speed than in this neural network implementation, by using the arithmetic-logic execution units of the GPU. Once translated, the shader programs would be reused. All this could be inserted in qemu, where a CPU is emulated by generating for each instruction a short program that is compiled and then the resulting executable functions are cached and executed during the interpretation of the program for the emulated CPU. In qemu, one could replace the native CPU compiler with a GPU compiler, either for CUDA or for a graphic shader language, depending on the target GPU. Then the compiled shaders could be loaded in the GPU memory, where, if the GPU is recent enough to support this feature, they could launch each other in execution. Eventually, one might be able to use a modified qemu running on the CPU to bootstrap a qemu + a shader compiler that have been translated to run on the GPU, so that the entire simulation of a CPU is done on the GPU. | ||
| ▲ | koolala 8 hours ago | parent | prev [-] | |
If its bindless and pre-compiled why not? What's a faster way? | ||