Remix.run Logo
jandrewrogers 2 hours ago

I can easily explain this, having worked in this space. The new languages don’t actually solve any urgent problems.

How people imagine scalable parallelism works and how it actually works doesn’t have a lot of overlap. The code is often boringly single-threaded because that is optimal for performance.

The single biggest resource limit in most HPC code is memory bandwidth. If you are not addressing this then you are not addressing a real problem for most applications. For better or worse, C++ is really good at optimizing for memory bandwidth. Most of the suggested alternative languages are not.

It is that simple. The new languages address irrelevant problems. It is really difficult to design a language that is more friendly to memory bandwidth than C++. And that is the resource you desperately need to optimize for in most cases.

bruce343434 2 hours ago | parent | next [-]

What does it mean to be friendly to memory bandwidth, and why does C++ excel at it, over, say, Fortran or C or Rust?

lugu 6 minutes ago | parent [-]

Parent talks about new languages, as per the article Fortran or C doing fine. I speculate the benefit of C++ over Rust how it let programmers instruct the compiler of warranty that goes beyong the initial semantic of the language. See __restrict, __builtin_prefetch and __builtin_assume_aligned. The programming language is a space for conversations between compiler builders and hardware designers.

j4k0bfr 2 hours ago | parent | prev | next [-]

I'm pretty interested in realtime computing and didn't realise C++ was considered bandwidth efficient! Coming from C, I find myself avoiding most 'new' C++ features because I can't easily figure out how they allocate without grabbing a memory profiler.

Narishma 37 minutes ago | parent | next [-]

I don't think there's much difference between C and C++ (and Rust, etc...) when it comes to this.

Joel_Mckay 21 minutes ago | parent [-]

There is unless using a llvm compiler that does naive things with code motion.

Rust is typically slowest (often negligible <3%), C++ has better CUDA support, and C can be heavily optimized with inline assembly (very unforgiving to juniors.)

Also, heavily associated with coding style =3

https://en.wikipedia.org/wiki/The_Power_of_10:_Rules_for_Dev...

Joel_Mckay an hour ago | parent | prev [-]

> realtime computing

Even with HDL defined accelerators, that statement may not mean what people assume. =3

https://en.wikipedia.org/wiki/Latency_(engineering)

https://en.wikipedia.org/wiki/Clock_domain_crossing

https://en.wikipedia.org/wiki/Metastability_(electronics)

https://en.wikipedia.org/wiki/The_Power_of_10:_Rules_for_Dev...

https://www.youtube.com/watch?v=G2y8Sx4B2Sk

Joel_Mckay an hour ago | parent | prev [-]

> C++ is really good at optimizing for memory bandwidth

In general, most modern CPU thread-safe code is still a bodge in most languages. If folks are unfortunate enough to encounter inseparable overlapping state sub-problems, than there is no magic pixie dust to escape the computational cost. On average, attempting to parallelize this type of code can end up >30% slower on identical hardware, and a GPU memory copy exchange can make it even worse.

Sometimes even compared to a large multi-core CPU, a pinned-core higher clock-speed chip will win out for those types of problems.

Thus, the mystery why most people revert to batching k copies of single-core-bound non-parallel version of a program was it reduces latency, stalls, cache thrashing, i/o saturation, and interprocess communication costs.

Exchange costs only balloon higher across networks, as however fast the cluster partition claims to be... the physics is still going to impose space-time constraints, as modern data-centers will spend >15% of energy cost just moving stuff around networks for lower efficiency code.

I like languages like Julia, as it implicitly abstracts the broadcast operator to handle which areas may be cleanly unrolled. However, much like Erlang/Elixir the multi-host parallelization is not cleanly implemented... yet...

The core problem with HPC software, has always been academics are best modeled like hermit-crabs with facilities. Once a lucky individual inherits a nice new shell, the pincers come out to all smaller entities who may approach with competing interests.

Best of luck, =3

"Crabs Trade Shells in the Strangest Way | BBC Earth"

https://www.youtube.com/watch?v=f1dnocPQXDQ