Remix.run Logo
pron 3 hours ago

Running Java workloads is very important for most CPUs these days, and both ARM and Intel consult with the Java team on new features (although Java's needs aren't much different from those of C++). But while you're right that with modern JITs, executing Java bytecode directly isn't too helpful, our concurrent collectors are already very efficient (they could, perhaps, take advantage of new address masking features).

I think there's some disconnect between how people imagine GCs work and how the JVMs newest garbage collectors actually work. Rather than exacting a performance cost, they're more often a performance boost compared to more manual or eager memory management techniques, especially for the workloads of large, concurrent servers. The only real cost is in memory footprint, but even that is often misunderstood, as covered beautifully in this recent ISMM talk (that I would recommend to anyone interested in memory management of any kind): https://youtu.be/mLNFVNXbw7I. The key is that moving-tracing collectors can turn available RAM into CPU cycles, and some memory management techniques under-utilise available RAM.

drob518 3 hours ago | parent | next [-]

So, the guys at Azul actually had this sort of business plan back in 2005, but they found that it was unsustainable and turned their attention to the software side, where they have done great work. I remember having a discussion with someone about Java processors and my common was just “Lisp machines.” It’s very difficult to outperform code running on commodity processor architectures. That train is so big and moving so fast, you really have to pick your niche (e.g. GPUs) to deliver something that outperforms it. Too much investment ($$$ and brainpower) flowing that direction. Even if you’re successful for one generation, you need to grow sales and have multiple designs in the pipeline at once. It’s nearly impossible.

That said, I do see opportunities to add “assistance hardware” to commodity architectures. Given the massive shift to managed runtimes, all of which use GC, over the last couple decades, it’s shocking to me that nobody has added a “store barrier” instruction or something like that. You don’t need to process Java in hardware or even do full GC in hardware, but there are little helps you could give that would make a big difference, similar to what was done with “multimedia” and crypto instructions in x86 originally.

xmcqdpt2 2 hours ago | parent | prev [-]

> The only real cost is in memory footprint

There are also load and store barriers which add work when accessing objects from the heap. In many cases, adding work in the parallel path is good if it allows you to avoid single-threaded sections, but not in all cases. Single-threaded programs with a lot of reads can be pretty significantly impacted by barriers,

https://rodrigo-bruno.github.io/mentoring/77998-Carlos-Gonca...

The Parallel GC is still useful sometimes!