| ▲ | zozbot234 5 hours ago | |
If anything, GPUs combine large private per-compute unit private address spaces and a separate shared/global memory, which doesn't mesh very well with linear memory access, just high locality. You can kinda get to the same arrangement on CPU by pushing NUMA (Non-Uniform Memory: only the "global" memory is truly Unified on a GPU!) to the extreme, but that's quite uncommon. "Compute-in-memory" is a related idea that kind of points to the same constraint: you want to maximize spatial locality these days, because moving data in bulk is an expensive operation that burns power. | ||