| ▲ | kllrnohj 5 hours ago | |
> Why is the NN-only portion almost as fast on an iPhone 17 compared to a V100 when the V100 has 4x the FP throughput? Might have some sequential section or a block size that struggles to fill a V100 or a large chunk of CPU-only work or any number of things like that. | ||