Remix.run Logo
kllrnohj 5 hours ago

> Why is the NN-only portion almost as fast on an iPhone 17 compared to a V100 when the V100 has 4x the FP throughput?

Might have some sequential section or a block size that struggles to fill a V100 or a large chunk of CPU-only work or any number of things like that.