▲ | Const-me 4 days ago | ||||||||||||||||
I have recently needed a decently performing FFT. Instead of doing Cooley-Tukey, I have realized the bruteforce version essentially computes two vector×matrix products, so I have interleaved and reshaped the matrices for sequential full-vector loads, and did bruteforce version with AVX1 and FMA3 intrinsics. Good enough for my use case of moderately sized FFT where matrices fit in L2 cache. | |||||||||||||||||
▲ | HarHarVeryFunny 4 days ago | parent [-] | ||||||||||||||||
I'm curious why you wouldn't just use a library like FFTW or Intel's IPP (or NVidia's cuFFT if applicable) ? | |||||||||||||||||
|