Remix.run Logo
SatvikBeri a day ago

> Python is sometimes slower (hot loops), but for that you have Numba

This is a huge understatement. At the hedge fund I work at, I learned Julia by porting a heavily optimized Python pipeline. Hundreds of hours had gone into the Python version – it was essentially entirely glue code over C.

In about two weeks of learning Julia, I ported the pipeline and got it 14x faster. This was worth multiple senior FTE salaries. With the same amount of effort, my coworkers – who are much better engineers than I am – had not managed to get any significant part of the pipeline onto Numba.

> And if something is truly performance critical, it should be written or rewritten in C++ anyway.

Part of our interview process is a take-home where we ask candidates to build the fastest version of a pipeline they possibly can. People usually use C++ or Julia. All of the fastest answers are in Julia.

sbrother 21 hours ago | parent | next [-]

> People usually use C++ or Julia. All of the fastest answers are in Julia

That's surprising to me and piques my interest. What sort of pipeline is this that's faster in Julia than C++? Does Julia automatically use something like SIMD or other array magic that C++ doesn't?

jakobnissen 19 hours ago | parent | next [-]

I use Rust instead of C++, but I also see my Julia code being faster than my Rust code.

In my view, it's not that Julia itself is faster than Rust - on the contrary, Rust as a language is faster than Julia. However, Julia's prototyping, iteration speed, benchmarking, profiling and observability is better. By the time I would have written the first working Rust version, I would have written it in Julia, profiled it, maybe changed part of the algorithm, and optimised it. Also, Julia makes more heavy use of generics than Rust, which often leads to better code specialization.

There are some ways in which Julia produces better machine code that Rust, but they're usually not decisive, and there are more ways in which Rust produces better machine code than Julia. Also, the performance ceiling for Rust is better because Rust allows you to do more advanced, low level optimisations than Julia.

SatvikBeri 9 hours ago | parent [-]

This is pretty much it – when we had follow up interviews with the C++ devs, they had usually only had time to try one or two high-level approaches, and then do a bit of profiling & iteration. The Julia devs had time to try several approaches and do much more detailed profiling.

SatvikBeri 10 hours ago | parent | prev | next [-]

To be clear, the fastest theoretically possible C++ is probably faster than the fastest theoretically possible Julia. But the fastest C++ that Senior Data Engineer candidates would write in ~2 hours was slower than the fastest Julia (though still pretty fast! The benchmark for this problem was 10ms, and the fastest C++ answer was 3 ms, and the top two Julia answers were 2.3ms and .21ms)

The pipeline was pretty heavily focused on mathematical calculations – something like, given a large set of trading signals, calculate a bunch of stats for those signals. All the best Julia and C++ answers used SIMD.

adgjlsfhk1 20 hours ago | parent | prev [-]

The main thing is just that Julia has a standard library that works with you rather than working against you. The built in sort will use radix sort where appropriate and a highly optimized quicksort otherwise. You get built in matrices and higher dimensional arrays with optimized BLAS/LaPack configured for you (and CSC+structured sparse matrices). You get complex and rational numbers, and a calling convention (pass by sharing) which is the fast one by default 90% of the time instead of being slow (copying) 90% of the time. You have a built in package manager that doesn't require special configuration, that also lets you install GPU libraries that make it trivial to run generic code on all sorts of accelerators.

Everything you can do in Julia you can do in C++, but lots of projects that would take a week in C++ can be done in an hour in Julia.

pbowyer 17 hours ago | parent | prev | next [-]

> Part of our interview process is a take-home where we ask candidates to build the fastest version of a pipeline they possibly can. People usually use C++ or Julia. All of the fastest answers are in Julia.

It would be fun if you could share a similar pipeline problem to your take-home (I know you can't share what's in your interview). I started off in scientific Python in 2003 and like noodling around with new programming languages, and it's great to have challenges like this to work through. I enjoyed the 1BRC problem in 2024.

SatvikBeri 9 hours ago | parent [-]

The closest publicly available problem I can think of is the 1 billion rows challenge. It's got a bigger dataset, but with somewhat simpler statistics – though the core engineering challenges are very similar.

https://github.com/gunnarmorling/1brc

drnick1 21 hours ago | parent | prev [-]

The C++ devs at your firm must be absolutely terrible if a newcomer using a scripting language can write faster software, or you are not telling the whole story. All of NumPy, Julia, MATLAB, R, and similar domain-specific, user-friendly libraries and platforms use BLAS and LAPACK for numerical calculations under the hood with some overhead depending on the implementation, so a reasonably optimized native implementation should always be faster. By the looks of it the C++ code wasn't compiled with -O3 if it can be trivially beaten by Julia.

SatvikBeri 10 hours ago | parent | next [-]

Are you aware that Julia is a compiled language with a heavy focus on performance? It is not in the same category as NumPy/MATLAB/R

postflopclarity 4 hours ago | parent | prev [-]

Julia is not a scripting language and can match C performance on many tasks.