▲ | bob1029 10 months ago | ||||||||||||||||
NUMA feels like a really big deal on AMD now. I recently refactored an evolutionary algorithm from Parallel.ForEach over one gigantic population to an isolated population+simulation per thread. The difference is so dramatic (100x+) that loss of large scale population dynamics seems to be more than offset by the # of iterations you can achieve per unit time. Communicating information between threads of execution should be assumed to be growing more expensive (in terms of latency) as we head further in this direction. More threads is usually not the answer for most applications. Instead, we need to back up and review just how fast one thread can be when the dependent data is in the right place at the right time. | |||||||||||||||||
▲ | Agingcoder 10 months ago | parent | next [-] | ||||||||||||||||
Yes - I almost view the server as a small cluster in a box, and an internal network with the associated performance impact when you start going out of box | |||||||||||||||||
▲ | bobmcnamara 10 months ago | parent | prev [-] | ||||||||||||||||
Is cross thread latency more expensive in time, or more expensive relative to things like local core throughput? | |||||||||||||||||
|