| ▲ | kinow 2 hours ago | |
I think hpc devs need an extra set of skills that are not so common. Such as parallel file systems, batch schedulers, NUMA, infiniband, and probably some domain-specific knowledge for the apps they will develop. This knowledge is also probably a bit niche, like climate modelling, earthquake simulation, lidar data processing, and so it goes. And even knowing OpenMP or MPI may not suffice if the site uses older versions or heterogeneous approaches with CUDA, FPGA, etc. Knowing the language and the shared/distributed mem libs help, but if your project needs a new senior dev than it may be a bit hard to find (although popularity of company/HPC, salary, and location also play a role). | ||
| ▲ | physicsguy an hour ago | parent [-] | |
You tend to only learn these things as they become a problem too. That's super super domain specific and it doesn't always translate between areas of research. So for e.g. when I did HPC simulation codes in magnetics, there was little point focusing on some of these areas because our codes were dominated by the long-range interaction cost which limited compute scaling. All of our effort was tuning those algorithms to the absolute max. We tried heterogenous CPU + GPU but had very mixed results, and at that time (2010s) the GPU memory wasn't large enough for the problems we cared about either. I then moved to CFD in industry. The concerns there were totally different since everything is grid local. Partitioning over multi-GPU is simple since only the boundaries need to be exchanged on each iteration. The problems there were much more on the memory bandwidth and parallel file system performance side. Basically, you have to learn to solve whatever challenges get thrown up by the specific domain problem. > And even knowing OpenMP or MPI may not suffice if the site uses older versions To be fair, you always have the option of compiling yourself, but most people I met in academia didn't have the background to do this. Spack and EasyBuild make this much much easier. | ||