| ▲ | blep-arsh a day ago | |
There are specialized computation kernels compiled for NPUs. A high-level program (that uses ONNX or CoreML, for example) can decide whether to run the computation using CPU code, a GPU kernel, or an NPU kernel or maybe use multiple devices in parallel for different parts of the task, but the low-level code is compiled separately for each kind of hardware. So it's somewhat abstracted and automated by wrapper libraries but still up to the program ultimately. | ||