▲ | hansvm 5 days ago | ||||||||||||||||
Or at runtime, if you'd like. You can create a generic binary that runs faster on supported platforms. | |||||||||||||||||
▲ | vlovich123 5 days ago | parent [-] | ||||||||||||||||
> Or at runtime, if you'd like You have to be careful about how you do it because those runtime checks can easily swamp the performance gains you get from SIMD. > also get the block size according to CPU features at compile time with `std.simd.suggestVectorSize()` You have to be careful with this since std.simd.suggestVectorSize is going to return values for the minimum SIMD version you're targeting I believe which can be suboptimal for portable binaries. You probably want a mix where you carefully compute the vector size for the current platform globally once and have multiple compiled dispatch paths in your binary that you can pick based on that value & let the CPU prefetcher hide the cost of a check before each invocation. | |||||||||||||||||
|