▲ | positron26 3 days ago | ||||||||||||||||||||||
IMO, matter of time before x86 or RISCV extension will show up to begin the inevitable unification of GPU and SIMD in an ISA. NUMA work and clustering over CCXs and sockets is paving the way for the software support in the OS. Question is what makes as much of Vulkan, OpenCL, and CUDA go away as possible? | |||||||||||||||||||||||
▲ | jauntywundrkind 3 days ago | parent [-] | ||||||||||||||||||||||
The vector based simd of RISC-V is very neat. Very hard but also very neat. Rather than having fixed instructions for specific "take 4 fp32 and multiply by 3 fp32" then needing a new instruction for fp64 them a new one for fp32 x fp64 them a new one for 4 x 4, it generalizes the instructions to be more data shape agnostic: here's a cross product operation, you tell us what the vector lengths are going to be, let the hardware figure it out. I also really enjoyed Semantic Streaming Registers paper, which makes load/store implicit in some ops, adds counters that can walk forward and back automatically so that you can loop immediately and start the next element, have the results dropped into the next result slot. This enables near DSP levels of instruction density, to be more ops focused rather than having to spend instructions writing and saving each step. https://www.research-collection.ethz.ch/bitstream/20.500.118... I still have a bit of a hard time seeing how we bridge CPU and GPU. The whole "single program multiple executor" waves aspect of the GPU is spiritually just launching a bunch of tasks for a job, but I still struggle to see an eventual convergence point. The GPU remains a semi mystical device to me. | |||||||||||||||||||||||
|