| ▲ | fooker 4 hours ago | |||||||
The comparison is often just plain old linear code. For example, one simd instruction vs multiple arithmetic instructions.
We have fifty years of CPU design optimizing for this. More often than not, you'll find this works better than vector instructions in practice.The concept behind vector instructions is great, and it starts to work out for larger widths like 512 bits. But it's extremely tricky to take advantage of that much SIMD with a compiler or manually. | ||||||||
| ▲ | pjmlp 3 hours ago | parent [-] | |||||||
Yet there are gains of doing e.g. string searches with SIMD, which you naturally aren't going to do in CUDA. | ||||||||
| ||||||||