Remix.run Logo
jpecar 2 hours ago

All these fancy HPC languages are all nice and dandy, but the hard reality I see on our cluster is that most of the work is done in Python, R and even Perl and awk. MPI barely reached us and people still prefer huge single machines to proper distributed computing. Yeah, bioinformatics is from another planet.

jltsiren an hour ago | parent | next [-]

Bioinformatics is an outlier within HPC. It's less about numerical computing and more about processing string data with weird algorithms and data structures that are rarely used anywhere else.

Distributed computing never really took off in bioinformatics, because most tasks are conveniently small. For example, a human genome is small enough that you can run most tasks involving a single genome on an average cost-effective server in a reasonable time. And that was already true 10–15 years ago. And if you have a lot of data, it usually means that you have many independent tasks.

Which is nice from the perspective of a tool developer. You don't have to deal with the bureaucracy of distributed computing, as it's the user's responsibility.

C++ is popular for developing bioinformatics tools. Some core tools are written in C, but actual C developers are rare. And Rust has become popular with new projects — to the extent that I haven't really seen C++20 or newer in the field.

jpecar 2 hours ago | parent | prev [-]

To add on this, what I see gaining traction are "workflow managers", tools that let people specify flow of data through various tools. These can figure out how to parallelize things on their own so users are not burdened with this task.

So from what I see actual programming language doesn't matter as much as how the work is organized. Anything helping people simplify this task is of immediate benefit to the science.

jkh1 31 minutes ago | parent | next [-]

Most of the time in bio-related fields, we need high-throughput computing not high-performance computing.

JohnWabwire 2 minutes ago | parent | prev [-]

[dead]