| ▲ | jandrewrogers 2 hours ago | ||||||||||||||||||||||
I can easily explain this, having worked in this space. The new languages don’t actually solve any urgent problems. How people imagine scalable parallelism works and how it actually works doesn’t have a lot of overlap. The code is often boringly single-threaded because that is optimal for performance. The single biggest resource limit in most HPC code is memory bandwidth. If you are not addressing this then you are not addressing a real problem for most applications. For better or worse, C++ is really good at optimizing for memory bandwidth. Most of the suggested alternative languages are not. It is that simple. The new languages address irrelevant problems. It is really difficult to design a language that is more friendly to memory bandwidth than C++. And that is the resource you desperately need to optimize for in most cases. | |||||||||||||||||||||||
| ▲ | bruce343434 2 hours ago | parent | next [-] | ||||||||||||||||||||||
What does it mean to be friendly to memory bandwidth, and why does C++ excel at it, over, say, Fortran or C or Rust? | |||||||||||||||||||||||
| |||||||||||||||||||||||
| ▲ | j4k0bfr 2 hours ago | parent | prev | next [-] | ||||||||||||||||||||||
I'm pretty interested in realtime computing and didn't realise C++ was considered bandwidth efficient! Coming from C, I find myself avoiding most 'new' C++ features because I can't easily figure out how they allocate without grabbing a memory profiler. | |||||||||||||||||||||||
| |||||||||||||||||||||||
| ▲ | Joel_Mckay an hour ago | parent | prev [-] | ||||||||||||||||||||||
> C++ is really good at optimizing for memory bandwidth In general, most modern CPU thread-safe code is still a bodge in most languages. If folks are unfortunate enough to encounter inseparable overlapping state sub-problems, than there is no magic pixie dust to escape the computational cost. On average, attempting to parallelize this type of code can end up >30% slower on identical hardware, and a GPU memory copy exchange can make it even worse. Sometimes even compared to a large multi-core CPU, a pinned-core higher clock-speed chip will win out for those types of problems. Thus, the mystery why most people revert to batching k copies of single-core-bound non-parallel version of a program was it reduces latency, stalls, cache thrashing, i/o saturation, and interprocess communication costs. Exchange costs only balloon higher across networks, as however fast the cluster partition claims to be... the physics is still going to impose space-time constraints, as modern data-centers will spend >15% of energy cost just moving stuff around networks for lower efficiency code. I like languages like Julia, as it implicitly abstracts the broadcast operator to handle which areas may be cleanly unrolled. However, much like Erlang/Elixir the multi-host parallelization is not cleanly implemented... yet... The core problem with HPC software, has always been academics are best modeled like hermit-crabs with facilities. Once a lucky individual inherits a nice new shell, the pincers come out to all smaller entities who may approach with competing interests. Best of luck, =3 "Crabs Trade Shells in the Strangest Way | BBC Earth" | |||||||||||||||||||||||