Remix.run Logo
deepsun 5 days ago

> it fills RAM/cache with extra Object stuff

But it has much lesser impact than in, say, Rust, where it's really an allocation (asking kernel for more RAM). Java's Object "allocation" happens in its own heap, which is a big chunk of RAM already pre-allocated from kernel. So in JVM boxing is really cheap, often just a pointer increment. Also oftentimes the wrapper object and its value are located near each other in the same memory page, so we're not adding a RAM access, more like L2 access.

PS: In some cases it can be even faster than Rust or C++, because it pre-allocates larger pages and drops them in chunks within generational GC (e.g. all values "allocated" to process one HTTP request can be GCed immediately after), while C++ is eager to destruct each object immediately. Also, a GC swipe can happen in a separate thread, not bothering the main thread user is waiting for. One can do the same in Rust and C++ using Arenas, of course.

scottlamb 5 days ago | parent | next [-]

> But it has much lesser impact than in, say, Rust, where it's really an allocation (asking kernel for more RAM). Java's Object "allocation" happens in its own heap, which is a big chunk of RAM already pre-allocated from kernel.

What? No. Rust doesn't ask the kernel for each allocation individually. That'd be insane; besides the crushing system call overhead, the kernel only does allocations in whole pages (4 KiB on x86-64) so it'd be incredibly wasteful of RAM.

Rust does the same thing as virtually every non-GCed language. It uses a memory allocator [1] that does bookkeeping in userspace and asks the kernel for big chunks via sbrk and/or mmap/munmap. Probably not the whole heap as a single chunk of virtual memory as in Java, but much closer to that than to the other extreme of a separate kernel call for each allocation.

[1] by default, just libc's malloc and free, although you can override this, and many people choose jemalloc or mimalloc instead. My high-level description applies equally well to any of the three.

gf000 4 days ago | parent | next [-]

While Java just does a thread local pointer bump, which will still be more efficient, and closer to stack allocation.

Of course you can optimize better with Rust/CPP/etc, but it is not trivial and you definitely not get it out of the box for free. My point is, this is a bit overblown how much overhead java has.

deepsun 4 days ago | parent | prev | next [-]

Yes, my mistake, thanks for pointing out, upvoting. I meant asking memory allocator, not kernel.

I meant that Java usually already has that memory allocated, JVM is a memory allocator of its own. It operates within its own heap. One can do that within Rust of course (or easier in Zig, as it welcomes passing an allocator around), it's just built-in in JVM already. Drawback is that it's more complex to release that memory back from the JVM process, so Java apps (not AOT-compiled) usually consume more RAM.

labadal 4 days ago | parent | prev [-]

I'm glad that I'm over that phase I had in university where I wanted to write a custom memory allocator for everything because "I understand my usage better". I will admit that it was a good bit of fun though.

IshKebab 4 days ago | parent | prev | next [-]

Aside from Rust not working like that (as scottlamb said), Rust is faster than Java even if Java has a faster allocator because Rust code usually does much less allocation in the first place.

jeroenhd 4 days ago | parent [-]

I don't know if Rust code allocates more or less in general. It really depends on what kind of code you write. Once Rust code reaches the complexity of the Java stacks it's replacing, you get a lot of wrapper objects, locks, and intermediates to cross thread boundaries and to prove soundness to the borrow checker.

I recently encountered an example of someone writing a Rust version of a popular Java library by just taking the Java code, commenting it out, and writing the Rust equivalent almost line for line. The approach works great (no need to reinvent the wheel and you can point to the existing documentation and code samples) but in terms of allocations, you're not going to find many improvements.

There's a type of Java code that looks more like C code than anything else that runs blazing fast with minimal overhead. It's not the type of Java code you'll probably encounter when writing Java applications, but if you use Java as a kind of cross-platform C target, you can get pretty close to Rust (and for some use cases even beat it). Java has a LOT of tricks up its sleave (pointer compression, dynamic realignment) that Rust can't automatically take advantage of.

Your AbstractFunctorClassFactoryProducer isn't going to be very allocation efficient, but once you start seeing volatile ints all over the place, things quickly become a lot faster.

Mawr 2 days ago | parent | prev [-]

> So in JVM boxing is really cheap, often just a pointer increment.

Nonsense. Sure, the act of creating a new allocation is cheap, but that's not where the expense lies at all. And of course, to make allocations cheap, you needed to give up something else.

Each allocation needs to be at some point handled by the GC, so the more allocations, the more GC pressure. That then forces you to make your GC generational, which constrains your GC design. You end up with a slower GC no matter what you do.

Moreover, every access to that allocation goes through a pointer, which is basically a guaranteed cache miss.

The classic case of this is iterating over an Array<Integer>. Not only did you have to make n allocations that the GC now has to keep track of, it's not possible to efficiently fetch items of this array from memory at all. You need to first fetch the pointers, then request the pointed to memory. Even in the best possible case of the pointer living right next to the data it points to, you're still paying the overhead of extra memory taking up space in your L1 cache.

Compare with an array of simple integers. Zero GC overhead, zero pointers, zero memory overhead, trivial to prefetch.

---

This is a case of horrid approach to design. Making allocating cheap must come at a cost of slowing down some other part of the GC and just encourages programmers to allocate more, which puts more and more pressure on the GC, making it even slower. It's a self-defeating approach that is therefore completely bone headed.

Allocations should be expensive!

With expensive allocations you get breathing room in your GC to make it otherwise faster. Moreover, since programmers are now encouraged not to allocate, the GC pressure is lower, making it even faster. But it gets even better. Optimizing becomes very clear and straightforward - just reduce allocations. This is great because it allows for targeted optimizations - if a particular piece of code is slow, just reduce its allocation rate and you'll get a nice speedup. Very easy to reason about.

That's why code written C/C++/Go/Rust tends to be so conscious of allocations and any indirections, but Java is full of linked lists, arrays of pointers, allocations everywhere and thousands of layers of indirections.

Cleaning up heaps of garbage can never be faster than not creating the garbage in the first place.