| ▲ | itamos 2 days ago | |
On one side it sounds promising to exploit shared memory properties to speed up inference. But on the other hand, the well established inference engines are perhaps already well optimized to overlap compute and communication efficiently. In this case the host-device copies are likely not a problem to tackle. | ||