Nah, memory is still the bottleneck. Kernel performance is already pretty good, but cpu memory is still dramatically slower than gpu memory.