▲ | bob1029 10 months ago | |
Time and throughput are inseparable quantities. I would interpret "local core throughput" as being the subclass of timing concerns wherein everything happens in a smaller physical space. I think a different way to restate the question would be: What are the categories of problems for which the time it takes to communicate cross-thread more than compensates for the loss of cache locality? How often does it make sense to run each thread ~100x slower so that we can leverage some aggregate state? The only headline use cases I can come up with for using more than <modest #> of threads is hosting VMs in the cloud and running simulations/rendering in an embarrassingly parallel manner. I don't think gaming benefits much beyond a certain point - humans have their own timing issues. Hosting a web app and ferrying the user's state between 10 different physical cores under an async call stack is likely not the most ideal use of the computational resources, and this scenario will further worsen as inter-thread latency increases. | ||
▲ | bobmcnamara 9 months ago | parent [-] | |
> How often does it make sense to run each thread ~100x slower so that we can leverage some aggregate state? What I mean is that even if cross core latency stayed constant, but IPC increased, the opportunity cost of waiting on something to cross cores has gone up too. I could see if the latency went up over time too though. It's wild to me to have seen some early cross CPU busses on motherboards then become integrated in chips, and brought back out between chips of multiple cores. There's no way to do (N-1)! links anymore. N went up too fast. |