30 Years of HPC: many hardware advances, little adoption of new languages

jandrewrogers an hour ago | parent | next [-]

I can easily explain this, having worked in this space. The new languages don’t actually solve any urgent problems.

How people imagine scalable parallelism works and how it actually works doesn’t have a lot of overlap. The code is often boringly single-threaded because that is optimal for performance.

The single biggest resource limit in most HPC code is memory bandwidth. If you are not addressing this then you are not addressing a real problem for most applications. For better or worse, C++ is really good at optimizing for memory bandwidth. Most of the suggested alternative languages are not.

It is that simple. The new languages address irrelevant problems. It is really difficult to design a language that is more friendly to memory bandwidth than C++. And that is the resource you desperately need to optimize for in most cases.

	▲	Joel_Mckay a few seconds ago \| parent \| next [-]
		> C++ is really good at optimizing for memory bandwidth In general, most modern CPU thread-safe code is still a bodge in most languages. If folks are unfortunate enough to encounter inseparable overlapping state sub-problems, than there is no magic pixie dust to escape the computational cost. On average, attempting to parallelize this type of code can end up >30% slower on identical hardware, and a GPU memory copy exchange can make it even worse. Sometimes even compared to a large multi-core CPU, a pinned-core higher clock-speed chip will win out for those types of problems. Thus, the mystery why most people revert to batching k copies of single-core-bound non-parallel version of a program was it reduces latency, stalls, cache thrashing, i/o saturation, and interprocess communication costs. Exchange costs only balloon higher across networks, as however fast the cluster partition claims to be... the physics is still going to impose space-time constraints, as modern data-centers will spend >15% of energy cost just moving stuff around networks for lower efficiency code. I like languages like Julia, as it explicitly abstracts the broadcast operator to handle which areas may be cleanly unrolled. However, much like Erlang/Elixir the multi-host parallelization is not cleanly implemented... yet... The core problem with HPC software, has always been academics are best modeled like hermit-crabs with facilities. Once a lucky individual inherits a nice new shell, the pincers come out to all smaller entities who may approach with competing interests. Best of luck, =3 "Crabs Trade Shells in the Strangest Way \| BBC Earth" https://www.youtube.com/watch?v=f1dnocPQXDQ
	▲	j4k0bfr 14 minutes ago \| parent \| prev \| next [-]
		I'm pretty interested in realtime computing and didn't realise C++ was considered bandwidth efficient! Coming from C, I find myself avoiding most 'new' C++ features because I can't easily figure out how they allocate without grabbing a memory profiler.
	▲	bruce343434 22 minutes ago \| parent \| prev [-]
		What does it mean to be friendly to memory bandwidth, and why does C++ excel at it, over, say, Fortran or C or Rust?

▲

jpecar 18 minutes ago | parent | prev | next [-]

All these fancy HPC languages are all nice and dandy, but the hard reality I see on our cluster is that most of the work is done in Python, R and even Perl and awk. MPI barely reached us and people still prefer huge single machines to proper distributed computing. Yeah, bioinformatics is from another planet.

	▲	jpecar 10 minutes ago \| parent [-]
		To add on this, what I see gaining traction are "workflow managers", tools that let people specify flow of data through various tools. These can figure out how to parallelize things on their own so users are not burdened with this task. So from what I see actual programming language doesn't matter as much as how the work is organized. Anything helping people simplify this task is of immediate benefit to the science.

▲

riffraff an hour ago | parent | prev | next [-]

Perhaps one issue lacking discussion in the article is how easy it is to find devs?

I've never worked in HPC but it seems it should be relatively simple to find a C/C++ dev that can pick up OpenMP, or one that already knows it, compared to hiring people who know Chapel.

The "scaling down" factor (how easy or interesting it is to use tool X for small use) seems a disadvantage of HPC-only languages, which creates a barrier to entry and a reduction in available workforce.

	▲	kinow 43 minutes ago \| parent [-]
		I think hpc devs need an extra set of skills that are not so common. Such as parallel file systems, batch schedulers, NUMA, infiniband, and probably some domain-specific knowledge for the apps they will develop. This knowledge is also probably a bit niche, like climate modelling, earthquake simulation, lidar data processing, and so it goes. And even knowing OpenMP or MPI may not suffice if the site uses older versions or heterogeneous approaches with CUDA, FPGA, etc. Knowing the language and the shared/distributed mem libs help, but if your project needs a new senior dev than it may be a bit hard to find (although popularity of company/HPC, salary, and location also play a role).

▲

swiftcoder 35 minutes ago | parent | prev [-]

It's interesting that none of the actor-based languages ever made it into this space. Feels like something with the design philosophy of Erlang would be pretty suitable to exploit millions of cores and a variety of interconnects...