Remix.run Logo
pron 11 hours ago

I was surprised to see that Java was slower than C++, but the Java code is run with `-XX:+UseSerialGC`, which is the slowest GC, meant to be used only on very small systems, and to optimise for memory footprint more than performance. Also, there's no heap size, which means it's hard to know what exactly is being measured. Java allows trading off CPU for RAM and vice-versa. It would be meaningful if an appropriate GC were used (Parallel, for this batch job) and with different heap sizes. If the rules say the program should take less than 8GB of RAM, then it's best to configure the heap to 8GB (or a little lower). Also, System.gc() shouldn't be invoked.

Don't know if that would make a difference, but that's how I'd run it, because in Java, the heap/GC configuration is an important part of the program and how it's actually executed.

Of course, the most recent JDK version should be used (I guess the most recent compiler version for all languages).

rockwotj 9 hours ago | parent | next [-]

It’s so hard to actually benchmark languages because it so much depends on the dataset, I am pretty sure with simdjson and some tricks I could write C++ (or Rust) that could top the leaderboard (see some of the techniques from the billion row challenge!).

tbh for silly benchmarks like this it will ultimately be hard to beat a language that compiles to machine code, due to jit warmup etc.

It’s hard to due benchmarks right, for example are you testing IO performance? are OS caches flushed between language runs? What kind of disk is used etc? Performance does not exist in a vacuum of just the language or algorithm.

pron 7 hours ago | parent [-]

> due to jit warmup

I think this harness actually uses JMH, which measures after warmup.

KerrAvon 8 hours ago | parent | prev [-]

Why are you surprised? Java always suffers from abstraction penalty for running on a VM. You should be surprised (and skeptical) if Java ever beats C++ on any benchmark.

pron 7 hours ago | parent | next [-]

The only "abstraction penalty" of "running on a VM" (by which I think you mean using a JIT compiler), is the warmup time of waiting for the JIT.

andersmurphy 4 hours ago | parent | next [-]

Its a statement of our times that this is getting down voted. JIT is so underrated.

stefs an hour ago | parent | prev [-]

in my opinion, this assertion suffers from the "sufficiently smart compiler" fallacy somewhat.

https://wiki.c2.com/?SufficientlySmartCompiler

sswatson an hour ago | parent | next [-]

The linked article makes a specific carveout for Java, on the grounds that its SufficientlySmartCompiler is real, not hypothetical.

remexre an hour ago | parent | prev [-]

c++ certainly also has and needs a similarly sufficiently smart compiler to be compiled at all…

woooooo 8 hours ago | parent | prev | next [-]

For the most naive code, if you're calling "new" multiple times per row, maybe Java benefits from out of band GC while C++ calls destructors and free() inline as things go out of scope?

Of course, if you're optimizing, you'll reuse buffers and objects in either language.

cryptos 2 hours ago | parent | prev [-]

In the end, even Java code becomes machine code at some point (at least the hot paths).

stefs an hour ago | parent [-]

yes, but that's just one part of the equation. machine code from compiler and/or language A is not necessarily the same as the machine code from compiler and/or language B. the reasons are, among others, contextual information, handling of undefined behavior and memory access issues.

you can compile many weakly typed high level languages to machine code and their performance will still suck.

java's language design simply prohibits some optimizations that are possible in other languages (and also enables some that aren't in others).