| ▲ | Decoupling Compute and Memory for Async GPUs | |
| 7 points by yiyingzhang 9 hours ago | 2 comments | ||
Cool open-source project that introduces a new programming model for decoupling compute and memory for NVIDIA GPUs that supports asynchronous memory operations (e.g., Hopper). 12% perf improvement over SOTA and 67% less kernel code. Paper: "VDCores: Resource Decoupled Programming and Execution for Asynchronous GPU" arXiv:2605.03190 | ||
| ▲ | bobbyzhu2008 9 hours ago | parent | next [-] | |
67% less kernel code is the more interesting number here — Hopper's async capabilities have been underutilized largely because the programming model is painful. Curious how it handles cases where compute and memory phases aren't cleanly separable. | ||
| ▲ | jhap 7 hours ago | parent | prev [-] | |
This seems like a better version of CUDA, for Hopper GPUs? | ||